Use conv_doc.pl instead of parse_doc
get it from http://www.htdig.org/files/contrib/parsers/conv_doc.pl.gz
gunzip it and move it to /usr/local/bin
get xpdf from ftp://ftp.foolabs.com/pub/xpdf/xpdf-0.91.tgz
get ps2ascii from your freetype or ghostscript installation
put this in your conf/htdig.conf
external_parsers:
application/msword->text/html /usr/local/bin/conv_doc.pl \
application/postscript->text/html /usr/local/bin/conv_doc.pl \
application/pdf->text/html /usr/local/bin/conv_doc.pl
On Wed, 1 Nov 2000, Roy Stephane wrote:
> I have problems indexing PDF Files. I have already considered the FAQ 4.9
> and 5.2. So all my path are OK and the MAX_DOC_SIZE parameter is greater
> than my bigger PDF file. I am working with the external parser "
> parse_doc.pl ".
>
> When I perform rundig in verbose mode, I find that htdig recognise all my
> PDF files, it shows theire size. After that, when htmerge find a PDF, it say
> that there is no excerpt, so the file (temporary file) is deleted.
>
> I tried to find the parameters that are used to call htdig form rundig.
> Since an output command on each variables shows nothing on screen, I asume
> that all the parameters used are having null value.
>
> I am using RedHat 6.2, an Appache 1.3
>
> Thanks for your help!
>
> St�phane Roy
> [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
> (450) 542-5906
>
> ------------------------------------
> To unsubscribe from the htdig mailing list, send a message to
> [EMAIL PROTECTED]
> You will receive a message to confirm this.
> List archives: <http://www.htdig.org/mail/menu.html>
> FAQ: <http://www.htdig.org/FAQ.html>
>
>
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives: <http://www.htdig.org/mail/menu.html>
FAQ: <http://www.htdig.org/FAQ.html>