Well, I tried this to no avail. I still receive no errors, but do see:
Deleted, no excerpt:
for every PDF file. All my Word docs are parsed fine using doc2html. Yes
this is version 3.1.6. Any other ideas? This is driving me nuts and many
documents are PDF format so I have to have them parsed.

Chris

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]]On Behalf Of Rzepa,
Henry
Sent: Thursday, April 04, 2002 12:12 AM
To: [EMAIL PROTECTED]
Subject: Re: [htdig] PDF problems


>I have switched to using the conv_doc.pl to parse my pdf files, I have ran
>this and the pdftotest to make certain the output was text and everything
>ran correctly. It all works perfectly, but when running htdig I see:
>Deleted, no excerpt: 7/http://
>for all the PDF files. WHY? I need to have the PDF documents parsed, but I
>get correct data when running conv_doc.pl, but nothing with htdig.
>

I presume we are talking  version 3.1.6 here?   I had a lot of difficulties
with this version in running external parsers,  with the same sort of
syndrome,
ie excerpt deleted. I disabled the  acroread invocation (which had worked,
as
above, when invoked manually to test) and moved directly to  pdf2html.pl
as below.

Curiously, we have only been able to get external parsers to work if they
are invoked from a script, as below. Our attempts to run executables
directly
(as in the disabled Acroread example below)  all result in the above
syndrome. so we now call the executables from a small script which calls
them with four arguments. I might mention that we did not get this problem
with v 3.1.4, and currently remain baffled as to the difference.

The below is from our conf file

#pdf_parser: /usr/adobe/Acrobat4.0/bin/acroread -toPostScript
external_parsers: application/pdf->text/html
/var/www/htdig/scripts/doc2html/pdf2html.pl

--

Henry Rzepa.
+44 (0870) 132 3747 (eFax) +44 0778 6268 220 (Mobile)
 http://www.ch.ic.ac.uk/rzepa/ Dept. Chemistry, Imperial College, London,
SW7  2AY, UK.


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to
<[EMAIL PROTECTED]> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to