According to Victor:
> G'day,
> I'm fairly new to the htdig environment, and am quite happy with the
> service that htdig has provided me. Although, htdig is quite touchy (
> it bus cores on me at least once a week ), I feel that the product is
> quite stable.
Really? I thought we had nailed most of those down in 3.1.1. Could you
get a stack backtrace when it dumps core? I'd like to know where it
fails and why. We've been working hard at making it stable, but if it's
still dumping core, we've evidentally missed a few trouble spots.
> One problem though. I'm using htdig 3.1.1 for SUNOS 4.1.4, built with
> gnu 2.7.2.3. Every time I try to dig pdf files, acroread is invoked to
> the $DISPLAY.
>
> I've checked the source and decided that my acroread requires a
> -toPostScript flag.
Between 3.1.0b4 & 3.1.0 (final), we changed the way acroread is called.
Now, the pdf_parser attribute must include the location of acroread,
plus the options " -toPostScript -pairs". This caused a bit of grief
for people who explicitly defined pdf_parser in their htdig.conf, rather
than using the compiled-in default.
> After some tweaking here and there, I decided that
> it would be best to go with xpdf pdftotext.
>
> One thing that I noticed is that the htdig-3.1.1/contrib/parse_doc.pl is
> quite old and does not explain where to get the xpdf/pdftotext from. Is
> there a reason why contrib/parse_doc.pl has not been updated with what I
> consider to be a working parse_doc.pl from
>
> http://www.scrc.umanitoba.ca/htdig/rpms/parse_doc.pl (Thanks Gilles!)
>
> Also, the FAQ, is quite confusing on this subject. I think it would be
> easier to understand the process with a working parse_doc.pl, IMHO :)
The reason for this is 3.1.1 was released in mid-February, and I changed
parse_doc.pl to handle PDFs in late-February to mid-March, after giving
up on acroread and htdig's built-in PDF support. The 3.1.2 release, just
out yesterday, includes the latest parse_doc.pl in the contrib directory,
and a few new FAQ entries including one explaining how to use parse_doc.pl
as an external parser for PDFs.
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.