Hello,

We have a search that returns a PDF file as the best hit. But the PDF file is an 
image, not text, so I don't know how htDig is finding it. I have customers who want to 
know how it does so they can repeat it. We are using doc2html and pdftotext 0.91. When 
I run the file through doc2html all I get is gibberish. The search is 
 
http://wwwindex.nlm.nih.gov/cgi/htsearch?config=www_exact;method=or;format=builtin-long;words=what%20would%20like;page=1
 
and the PDF file is the first link (bbaaaa.pdf). 
 
Any explanations on how this file gets indexed? (I'd love to tell them htDig has OCR, 
but they wouldn't believe it.)
 
Thanks,
Terry Luedtke
National Library of Medicine


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to