[htdig] PDF files indexing...

Thierry FLORAC Wed, 03 Oct 2001 04:21:40 -0700

Hi,


I'm actually using ht/dig-3.1.5, to index informations stored on a Debian
GNU/Linux Apache server.
My problem(s ?) is that I can't index PDF files correctly. The symptoms are
as follow when running "rundig -a -v" :

  ...
  26:26:1:http://dsi.onf.fr/docs/rapcarcenac.pdf:  size = 448512
  ...
  Deleted, no excerpt: 26/http://dsi.onf.fr/docs/rapcarcenac.pdf
  ...

This error is displayed for every PDF file.
What does this message meens ??

My htdig.conf looks like this :

  max_doc_size:           20000000
  external_parsers: \
                application/msword /usr/share/htdig/parse_doc.pl \
                application/postscript /usr/share/htdig/parse_doc.pl \
                application/pdf /usr/share/htdig/parse_doc.pl

My parse_doc.pl script is configured to parse PDF files with pdftotext,
which is installed as part of the xpdf-i package, but ht/dig seems to
always use acroread, except when I define a "pdf_parser" option in
htdig.conf.

Could anyone help me to debug this ?? I've tried a lot of thinks read in
the mailing list archives, without any good result...

Thanks,

  Thierry

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

[htdig] PDF files indexing...

Reply via email to