At 1:15 PM -0400 1/11/99, Geoff Hutchison wrote:
>At 11:35 AM -0400 1/11/99, Rick Wiggins wrote:
>
>>Perhaps future versions of 'htdig' can generalize the 'pdf_parser'
>>attribute such that this modification would not be necessary when using
>>programs other than Acroread.  Just a thought...
>
>Future versions will do so (see the TODO.html file). However, see below.
>
>>comes with a 'pdftops' utility program.  To use this program, I had to
>>modify 'htdig' so that it wouldn't include the '-toPostScript' command
>>option and would completely specify the output filename, like this:
>
>Mm. Last time this came up, when the PDF parser was first included, I was
>given a pretty definitive answer from Michael J. Long <[EMAIL PROTECTED]>:
>
>>I have looked at the output from acroread and from xpdf's version of
>>pdftops and they differ slightly.  Sylvain's PDF module uses acroread
>>specific tags (BT and ET) to determine where to start searching for
>>words to index.  Unfortunately, pdftops does not insert these tags into
>>the PostScript output.
>>
>>Therefore, the PDF module will not work with pdftops as is.  I have some
>>theories on how to tweak the PDF module to work with both:
>>      - convert the pdf to ps and use the Postscript module to
>>        parse it (looking at the way the modules work, I don't
>>        know if this is possible, I haven't look at it that much
>>        though)
>>      - convert the pdf to text and parse the text
>>      - improve the parsing capability by stealing code from
>>        the Postscript module
>
>Now if the situation has changed, let me know. In the meantime, I'm not
>going to suggest using xpdf. I'd rather not suggest acroread since it's not
>open source. But...

Interesting.  'pdftops' seems to be working fine for me. :-/  I'm using
version 0.80 of 'xpdf' which came out on Nov. 27, 1998.  Perhaps this
problem has been corrected in this version?  We'll be indexing a large
number of PDFs in the near future.  I'll report back how it goes using
'pdftops'...

Rick


----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.

Reply via email to