Hello

I am trying to use doc2html.pl (v3.0 4-June-2001), pdf2html.pl (v1.0
25-May-2001) and xpdf (v1.0) to index PDF documents using htdig (v3.1.6).
Running doc2html at the command line produces an html result on the test PDF
file I am using, eg:

fortuna:/$ /home/flyingfish/doc2html/doc2html.pl
/home/flyingfish/www/members/callsheets/ACROBAT.PDF application/pdf
<HTML>
<HEAD>
<TITLE>Adobe Acrobat Reader UpSell PDF</TITLE>
<META NAME="DESCRIPTION" CONTENT="There's more to Acrobat than the Reader!">
</HEAD>
<BODY>
bc
<br>
...

However when I use rundig the following happens.

With htdig.conf containing:
external_parsers:       application/pdf->text/html
/home/fylingfish/doc2thml/doc2html.pl

...
pick: www.flyingfish.co.nz, # servers = 1
0:0:0:http://www.flyingfish.co.nz/members/callsheets/ACROBAT.PDF: Retrieval
command for http://www.flyingfish.co.nz/members/callsheets/ACROBAT.PDF: GET
/members/callsheets/ACROBAT.PDF HTTP/1.0
User-Agent: htdig/3.1.6 ([EMAIL PROTECTED])
Host: www.flyingfish.co.nz

Header line: HTTP/1.1 200 OK
Header line: Date: Thu, 11 Apr 2002 00:06:58 GMT
Header line: Server: Apache/1.3.19 (Unix) FrontPage/5.0.2.2510
Header line: Last-Modified: Wed, 10 Apr 2002 22:23:36 GMT
Converted Wed, 10 Apr 2002 22:23:36 GMT to Wed, 10 Apr 2002 22:23:36
Header line: ETag: "72efe-6f23-3cb4bb68"
Header line: Accept-Ranges: bytes
Header line: Content-Length: 28451
Header line: Connection: close
Header line: Content-Type: application/pdf
Header line:
returnStatus = 0
Read 8192 from document
Read 8192 from document
Read 8192 from document
Read 3875 from document
Read a total of 28451 bytes
 size = 28451
pick: www.flyingfish.co.nz, # servers = 1
htmerge: Sorting...
DB2 problem...: missing or empty key value specified
...

With external parser line commented out of config file:
...
Header line: HTTP/1.1 200 OK
Header line: Date: Thu, 11 Apr 2002 00:09:45 GMT
Header line: Server: Apache/1.3.19 (Unix) FrontPage/5.0.2.2510
Header line: Last-Modified: Wed, 10 Apr 2002 22:23:36 GMT
Converted Wed, 10 Apr 2002 22:23:36 GMT to Wed, 10 Apr 2002 22:23:36
Header line: ETag: "72efe-6f23-3cb4bb68"
Header line: Accept-Ranges: bytes
Header line: Content-Length: 28451
Header line: Connection: close
Header line: Content-Type: application/pdf
Header line:
returnStatus = 0
Read 8192 from document
Read 8192 from document
Read 8192 from document
Read 3875 from document
Read a total of 28451 bytes
PDF::setContents(28451 bytes)
PDF::parse(http://www.flyingfish.co.nz/members/callsheets/ACROBAT.PDF)
PDF::parse: cannot find pdf parser /usr/local/bin/acroread
 size = 28451
pick: www.flyingfish.co.nz, # servers = 1
htmerge: Sorting...
DB2 problem...: missing or empty key value specified

Deleted, no excerpt:
0/http://www.flyingfish.co.nz/members/callsheets/ACROBAT.PDF
...

I have read all the READMEs, DETAILs, FAQ and Mail Archives I can find, but
I just can't see what the error is. [There are no spaces at the end of lines
in the htdig.conf file and the max_doc_size is fine. ;-]

Regards
James Robertson
Web Developer - Composite Design

mailto:[EMAIL PROTECTED]
+64-4-973 4555


_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to