According to Christian Fredrickson:
> Well, I tried this to no avail. I still receive no errors, but do see:
> Deleted, no excerpt:
> for every PDF file. All my Word docs are parsed fine using doc2html. Yes
> this is version 3.1.6. Any other ideas? This is driving me nuts and many
> documents are PDF format so I have to have them parsed.

OK, but have you determined for sure that htdig is actually
calling doc2html for PDF files too, or is it just doing it for
Word docs?  What does your external_parsers attribute setting
look like?  Are you sure there aren't any problems with it (see
http://www.htdig.org/FAQ.html#q5.31).

It would really be helpful to see the output of htdig -ivvvv
when start_url is the URL for a single PDF file - that would
tell us a lot about what htdig is actually doing when it gets
a PDF file.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to