Re: [htdig] Binary summary with PDF's

Gilles Detillieux Fri, 12 Oct 2001 12:19:29 -0700

According to Curtis J. Peredina:
> With the default search results, when the result is a PDF, I get the
> following:
> [nxtrend.pdf] <stars here>
>      n6PYmD&'Z8P=@R].E'SKC-pLR:I[gjIO(S0?,*o+U!Qo=T%Oi*TJ 
>      <appropriate URL is fine here>
> 
> What step generates the summary characters? Ive been debugging this for
> a while to no avail. I've combed the FAQ, but it handles binary return
> strings in the more general sense. All other docs seem ok. This just
> affects PDF's.


The excerpt is generally collected by the same procedure that collects
words to be indexed.  The possible exception to this is an external parser
script (as opposed to an external converter), which can generate the
"h" record completely independently from the "w" records.  In a later
message, you indicated that you're using acroread:

> pdf_parser:             /usr/local/Acrobat/4/bin/acroread

Well, Acrobat 4 has a great deal of problems, but if it's not crashing
on you, then it may be that other tools for reading PDFs won't have more
success with these files.  I'm guessing that your PDFs use strange font
encodings.  Still, it can't hurt to try pdftotext, and if that works,
use it with an external converter script like doc2html.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Re: [htdig] Binary summary with PDF's

Reply via email to