Re: [htdig] external parser/problem indexing PDF files

David Adams Mon, 13 Aug 2001 07:46:30 -0700

On Thu, 9 Aug 2001 11:38:43 -0700  Jason Small 
<[EMAIL PROTECTED]> wrote:

> Hello...and help!
> 
> I am running htdig-3.1.5 on apache 1.3.  I am trying to parse/index PDF
> files and have had no success to date.  I am using the pdf2text and pdfinfo
> utilities from xpdf-0.92 with pdf2html.pl. When I execute the pdf2html.pl
> script from the command line, I receive html output.  However, when I try to
> call the script through rundig, it appears to ignore the external_parsers
> specification (external_parsers:
> "application/pdf->text/html"/local/apache/cgi-bin/pdf2html.pl).  I tried
> modifying the external_parsers line as follows "application/pdf;
> charset=iso-8859-1->text/html"... but this resulted in  it looking for
> acroread.  Either way, I get the "deleted, no excerpt" message and none of
> the pdf files get indexed.  The files DO contain text and max_doc_size is
> set to a value larger than the largest pdf file.  
> 
> Is it possible that there is a setting at the server level that needs
> adjusting?  I have tested everything I can think of relating to the htdig
> configuration file (and have read through every e-mail in the archives that
> I could find).
> 
> Attached is a text file of the output I received from (./rundig -vvv)
> 
> Any help would be greatly appreciated!
> Jason

Could you quote the *exact* lines in your configuration 
file with the external_parsers attribute?

It should be something like:

external_parsers: application.pdf->text/html /local/apache/cgi-bin/pdf2html.pl

----------------------
David Adams
[EMAIL PROTECTED]


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html
Re: [htdig] external parser/problem indexing PDF files

Reply via email to