According to Christian Fredrickson: > Well, I tried this to no avail. I still receive no errors, but do see: > Deleted, no excerpt: > for every PDF file. All my Word docs are parsed fine using doc2html. Yes > this is version 3.1.6. Any other ideas? This is driving me nuts and many > documents are PDF format so I have to have them parsed.
OK, but have you determined for sure that htdig is actually calling doc2html for PDF files too, or is it just doing it for Word docs? What does your external_parsers attribute setting look like? Are you sure there aren't any problems with it (see http://www.htdig.org/FAQ.html#q5.31). It would really be helpful to see the output of htdig -ivvvv when start_url is the URL for a single PDF file - that would tell us a lot about what htdig is actually doing when it gets a PDF file. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

