Title: Message
Steve,
 
If you have got
 
    /opt/www/htdig/bin/doc2html.pl /usr/website/pdfs/phoenix.pdf  application/pdf
 
working OK then you are nearly there!
 
What does the htdig -vvv output show for the MIME-type of the .pdf files it GETs?
Is it "application/pdf" exactly or something different?  Does it match the MIME-type
in the external_parsers statement in your config. file?  Most likely there is a mistake
in your config. file, but it could be that your server is not returning the expected MIME-type.
 
--
David Adams
Computing Services
Southampton University
----- Original Message -----
Sent: Sunday, March 10, 2002 3:33 PM
Subject: RE: [htdig] Deleted, no excerpt with pdf files

David

I have done a clean re-install and got rid of the perl errors, but the basic symptoms remain the same.

To recap (with an empty database directory) , running htDig -vvvv lists the contents of the .pdf in text on the screen, but wordlist.db.work only includes words from index.html, not from the linked .pdf. Then htMerge presumably has an URL pointing to the pdf, but no content and rejects it as having 'no Excerpt'.

Both the tests you suggest produce HTML of the pdf listed on the console, so I guess they are passed, so 'If that still fails, then you still havn't configured doc2html.pl correctly' means that I have configured doc2html.pl correctly?

Can you suggest any other diagnostics I could run to pin it down?

Otherwise next step is a clean install of adifferent distro

-----Original Message-----
From: David Adams [mailto:[EMAIL PROTECTED]]
Sent: 04 March 2002 16:50
To: [EMAIL PROTECTED]
Cc: htdig
Subject: Re: [htdig] Deleted, no excerpt with pdf files

Steve,
 
You must:
 
    Put the full pathname of your Perl binary in the first line of each of your Perl scripts.
    Configure doc2html.pl with the full pathname of where you have installed pdf2html.pl.
    Configure pdf2html.pl with the full pathnames of where you have installed pdftotext and pdfinfo.
 
Test pdf2html.pl at the command line:
 
    pdf2html.pl /usr/website/pdfs/phoenix.pdf 
 
If that works then try doc2html.pl:
 
    /opt/www/htdig/bin/doc2html.pl /usr/website/pdfs/phoenix.pdf  application/pdf
 
If that still fails, then you still havn't configured doc2html.pl correctly.
 
--
David Adams
Computing Services
Southampton University

________________________________________________________________________
This e-mail has been scanned for all viruses by Star Internet. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________

Reply via email to