>I have switched to using the conv_doc.pl to parse my pdf files, I have ran
>this and the pdftotest to make certain the output was text and everything
>ran correctly. It all works perfectly, but when running htdig I see:
>Deleted, no excerpt: 7/http://
>for all the PDF files. WHY? I need to have the PDF documents parsed, but I
>get correct data when running conv_doc.pl, but nothing with htdig.
>

I presume we are talking  version 3.1.6 here?   I had a lot of difficulties
with this version in running external parsers,  with the same sort of syndrome,
ie excerpt deleted. I disabled the  acroread invocation (which had worked, as
above, when invoked manually to test) and moved directly to  pdf2html.pl
as below.

Curiously, we have only been able to get external parsers to work if they
are invoked from a script, as below. Our attempts to run executables directly
(as in the disabled Acroread example below)  all result in the above
syndrome. so we now call the executables from a small script which calls
them with four arguments. I might mention that we did not get this problem
with v 3.1.4, and currently remain baffled as to the difference.

The below is from our conf file

#pdf_parser: /usr/adobe/Acrobat4.0/bin/acroread -toPostScript
external_parsers: application/pdf->text/html 
/var/www/htdig/scripts/doc2html/pdf2html.pl

-- 

Henry Rzepa.
+44 (0870) 132 3747 (eFax) +44 0778 6268 220 (Mobile)
 http://www.ch.ic.ac.uk/rzepa/ Dept. Chemistry, Imperial College, London, SW7  2AY, UK.


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to