Hello,
We have a search that returns a PDF file as the best hit. But the PDF file is an
image, not text, so I don't know how htDig is finding it. I have customers who want to
know how it does so they can repeat it. We are using doc2html and pdftotext 0.91. When
I run the file through doc2html all I get is gibberish. The search is
http://wwwindex.nlm.nih.gov/cgi/htsearch?config=www_exact;method=or;format=builtin-long;words=what%20would%20like;page=1
and the PDF file is the first link (bbaaaa.pdf).
Any explanations on how this file gets indexed? (I'd love to tell them htDig has OCR,
but they wouldn't believe it.)
Thanks,
Terry Luedtke
National Library of Medicine
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html