According to Christian Fredrickson:
> >I have switched to using the conv_doc.pl to parse my pdf files, I have ran
> >this and the pdftotest to make certain the output was text and everything
> >ran correctly. It all works perfectly, but when running htdig I see:
> >Deleted, no excerpt: 7/http://
> >for all the PDF files. WHY? I need to have the PDF documents parsed, but I
> >get correct data when running conv_doc.pl, but nothing with htdig.

According to Rzepa, Henry:
> I presume we are talking  version 3.1.6 here?

I've learned not to assume such things unless it's absolutely clear by
context what version the person is talking about.  There have always
been all sorts of issues that can lead to the error above, regardless
of which version you're running.  It may be that Christian is having
the same problem as you, but it ain't necessarily so.

Christian, could you specify which version of ht://Dig you're running,
and on what platform (OS, version, distribution if applicable).  Also,
I recommend you try running htdig -ivvvv with the URL of a PDF file as
start_url, and look for clues in the verbose output of htdig.

>  I had a lot of difficulties
> with this version in running external parsers,  with the same sort of syndrome,
> ie excerpt deleted. I disabled the  acroread invocation (which had worked, as
> above, when invoked manually to test) and moved directly to  pdf2html.pl
> as below.
> 
> Curiously, we have only been able to get external parsers to work if they
> are invoked from a script, as below. Our attempts to run executables directly
> (as in the disabled Acroread example below)  all result in the above
> syndrome. so we now call the executables from a small script which calls
> them with four arguments. I might mention that we did not get this problem
> with v 3.1.4, and currently remain baffled as to the difference.

The main difference between external_parsers handling between 3.1.5 and
3.1.6 is that the parser is called directly via fork() and execv() rather
than by a popen() call which calls the shell to parse a command line
to call the parser.  I don't see how this would prevent it from running
executables directly, though, as execv can certainly handle executables
as long as the patch is correct.

Also, this change has no bearing on the handling of pdf_parser, which
has remained essentially unchanged since 3.1.3, so if that stopped working
for you in 3.1.6 it's for another reason than the htdig code itself.

> The below is from our conf file
> 
> #pdf_parser: /usr/adobe/Acrobat4.0/bin/acroread -toPostScript
> external_parsers: application/pdf->text/html 
>/var/www/htdig/scripts/doc2html/pdf2html.pl

I don't see a problem with either of those lines, as long as the paths
are correct and the programs work.  I would be interested in knowing
more about which problems you have with 3.1.6 that you're sure didn't
happen with 3.1.4 or 3.1.5.  If you can pin it down to something solid
and reproduceable, please let us know the error messages and/or verbose
output from htdig, as well as details about your platform.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to