Your config. file looks OK, but check that you don't have a space after any
of those end-of-line \ characters.

Have you checked that the /usr/share/htdig/parse_doc.pl script runs OK from
the command line and does extract text from the .PDF files in question?

In the long run you should consider changing to use an external converter,
rather than parse_doc.pl
The doc2html.pl script will provide more diagnostic information, including
how many characters it has extracted from each document.

--
David Adams
Computing Services
Southampton University


----- Original Message -----
From: "Thierry FLORAC" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, October 03, 2001 12:42 PM
Subject: [htdig] PDF files indexing...


>
>   Hi,
>
> I'm actually using ht/dig-3.1.5, to index informations stored on a Debian
> GNU/Linux Apache server.
> My problem(s ?) is that I can't index PDF files correctly. The symptoms
are
> as follow when running "rundig -a -v" :
>
>   ...
>   26:26:1:http://dsi.onf.fr/docs/rapcarcenac.pdf:  size = 448512
>   ...
>   Deleted, no excerpt: 26/http://dsi.onf.fr/docs/rapcarcenac.pdf
>   ...
>
> This error is displayed for every PDF file.
> What does this message meens ??
>
> My htdig.conf looks like this :
>
>   max_doc_size:           20000000
>   external_parsers: \
>                 application/msword /usr/share/htdig/parse_doc.pl \
>                 application/postscript /usr/share/htdig/parse_doc.pl \
>                 application/pdf /usr/share/htdig/parse_doc.pl
>
> My parse_doc.pl script is configured to parse PDF files with pdftotext,
> which is installed as part of the xpdf-i package, but ht/dig seems to
> always use acroread, except when I define a "pdf_parser" option in
> htdig.conf.
>
> Could anyone help me to debug this ?? I've tried a lot of thinks read in
> the mailing list archives, without any good result...
>
> Thanks,
>
>   Thierry
>
> _______________________________________________
> htdig-general mailing list <[EMAIL PROTECTED]>
> To unsubscribe, send a message to
<[EMAIL PROTECTED]> with a subject of unsubscribe
> FAQ: http://htdig.sourceforge.net/FAQ.html
>


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to