According to Fabio Barone:
> I already checked the installation of acroread before writing to you.
>
> $>which acroread returns /usr/local/bin/acroread, so it is installed and
> the script should find it...
Hmm, yes, if it was in /usr/local/bin/acroread when you ran configure,
it should have found it. I can't imagine what the problem is here, unless
you're overriding pdf_parser with an invalid value in your htdig.conf.
> Does 3.0.b4 not support external_parsers correctly? I use the statement for
> a script parsing .doc files
> (and this either doesn't seem to work properly: There are though no error
> messages and everything seams ok, but
> htsearch doesn't find anything inside a word doc....)
It should work, but there were a number of problems due to lack of error
checking. This wouldn't stop it from working with a well behaved external
parser, but if the parser spat out garbage, it could make htdig crash.
That alone is reason enough to upgrade.
Also, that version didn't support meta tags in external parsers.
That's probably not a big deal, as I don't know of any external parser
script that uses that.
If your Word documents aren't being indexed properly, there must be
something wrong with your external parser script. Try running it manually
on some of your Word documents to see what sort of output it spits out.
If your external parser is parse_word_doc.pl, or some variation of it
that uses catdoc, you can replace it with my latest parse_doc.pl script
(URL below), which handles MS Word (with catdoc), PostScript (with gs),
and PDFs (with xpdf). Also, if you're not running the latest version
of catdoc, and the version you're running can't deal with your Word
documents, you may want to upgrade catdoc too.
> > If you don't have acroread, you can use an external parser instead.
> > You can use the latest version of the parse_doc.pl script as an external
> > parser for files of the application/pdf type. It uses pdftotext (from
> > the xpdf 0.80 package) to extract the text from the PDF file, and formats
> > the text as required by htdig. If you're going to use external parsers,
> > you really ought to upgrade to htdig 3.1.1, though, because a lot of fixes
> > have gone into external parser support recently. The latest version of
> > parse_doc.pl can be taken from:
> >
> > http://www.scrc.umanitoba.ca/htdig/rpms/parse_doc.pl
> >
> > and documentation on external parsers is at:
> >
> > http://www.htdig.org/attrs.html#external_parsers
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.