Title: Re: Coredump when indexing pdf files with Htdig 3.16  
>The "Deleted, no excerpt:" message may be due to pdf2html.pl failing to work, or there being no >text that can be extracted from the .pdf file.
 
> Have you tried executing pdf2html.pl from the command line to see what
> output you get with these .pdf files?
 
> David Adams
 
Thanks David for your reponse.

I do the test again and i think  i know what is the problem (but i don't have the solution yet )-  )
If i used the command line with some pdf (word document converted with acrobat for exemple), all is all right.
The matter is when the pdf files is only the result of the scan of different document and the pdf file have no text who can be extracted (thanks fr the idea!).
The result of the extract is :
<HTML>
<HEAD>
<TITLE> ADOBE....</TITLE>
</HEAD>
<BODY>
</BODY>
</HTML>

What can i do ?
One of the solution could be to put these files in exclude_urls in htdig.conf but it can be very long...


Herve


Reply via email to