[posted and mailed] > Dominique Fourtune wrote: > > I'm using htdig 3.1.6, to parse html pages created by Apache mod-autoindex > I can't merge pdf files, I get always error message " Deleted no excerpts" > I'm using doc2html.pl, it is OK for .doc files, but not for pdf files > pdf2html.pl on command line parses pdf files and creates html files > > I found this old post :
| According to Paul COURBIS: | > When I run htmerge, I get a lot of messages : | > Deleted, no excerpt: xxx/http... | > | > What does it mean ? Why does htmerge suppress so many documents from the | > database ? As far as I understand english it seems that it means that | > there's no keyword for these pages, despite the fact that when I connect | > to it there's a lot of text... | | The most common causes of this are: | - a noindex directive somewhere in the document | - the document was disallowed by robots.txt | - the server_max_docs limit was reached before this document could be parsed | | You'd need to correlate the htmerge -v output back to the htdig -v (or -vv) | output to see which of these conditions occurred. > I think the first reason is the good one (I have no robots), but I need > help to go further : what is a noindex directive ? http://www.htdig.org/attrs.html#noindex_start But I'd rather think it's the max_doc_size (see http://www.htdig.org/attrs.html#max_doc_size) cu, Martin -- One OS to rule them all | Martin Vorlaender | VMS & WNT programmer One OS to find them | work: [EMAIL PROTECTED] One OS to bring them all | http://www.pdv-systeme.de/users/martinv/ And in the Darkness bind them.| home: [EMAIL PROTECTED] ------------------------------------------------------- This SF.net email is sponsored by: IBM Linux Tutorials. Become an expert in LINUX or just sharpen your skills. Sign up for IBM's Free Linux Tutorials. Learn everything from the bash shell to sys admin. Click now! http://ads.osdn.com/?ad_id78&alloc_id371&op=click _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

