Hello I am trying to use doc2html.pl (v3.0 4-June-2001), pdf2html.pl (v1.0 25-May-2001) and xpdf (v1.0) to index PDF documents using htdig (v3.1.6). Running doc2html at the command line produces an html result on the test PDF file I am using, eg:
fortuna:/$ /home/flyingfish/doc2html/doc2html.pl /home/flyingfish/www/members/callsheets/ACROBAT.PDF application/pdf <HTML> <HEAD> <TITLE>Adobe Acrobat Reader UpSell PDF</TITLE> <META NAME="DESCRIPTION" CONTENT="There's more to Acrobat than the Reader!"> </HEAD> <BODY> bc <br> ... However when I use rundig the following happens. With htdig.conf containing: external_parsers: application/pdf->text/html /home/fylingfish/doc2thml/doc2html.pl ... pick: www.flyingfish.co.nz, # servers = 1 0:0:0:http://www.flyingfish.co.nz/members/callsheets/ACROBAT.PDF: Retrieval command for http://www.flyingfish.co.nz/members/callsheets/ACROBAT.PDF: GET /members/callsheets/ACROBAT.PDF HTTP/1.0 User-Agent: htdig/3.1.6 ([EMAIL PROTECTED]) Host: www.flyingfish.co.nz Header line: HTTP/1.1 200 OK Header line: Date: Thu, 11 Apr 2002 00:06:58 GMT Header line: Server: Apache/1.3.19 (Unix) FrontPage/5.0.2.2510 Header line: Last-Modified: Wed, 10 Apr 2002 22:23:36 GMT Converted Wed, 10 Apr 2002 22:23:36 GMT to Wed, 10 Apr 2002 22:23:36 Header line: ETag: "72efe-6f23-3cb4bb68" Header line: Accept-Ranges: bytes Header line: Content-Length: 28451 Header line: Connection: close Header line: Content-Type: application/pdf Header line: returnStatus = 0 Read 8192 from document Read 8192 from document Read 8192 from document Read 3875 from document Read a total of 28451 bytes size = 28451 pick: www.flyingfish.co.nz, # servers = 1 htmerge: Sorting... DB2 problem...: missing or empty key value specified ... With external parser line commented out of config file: ... Header line: HTTP/1.1 200 OK Header line: Date: Thu, 11 Apr 2002 00:09:45 GMT Header line: Server: Apache/1.3.19 (Unix) FrontPage/5.0.2.2510 Header line: Last-Modified: Wed, 10 Apr 2002 22:23:36 GMT Converted Wed, 10 Apr 2002 22:23:36 GMT to Wed, 10 Apr 2002 22:23:36 Header line: ETag: "72efe-6f23-3cb4bb68" Header line: Accept-Ranges: bytes Header line: Content-Length: 28451 Header line: Connection: close Header line: Content-Type: application/pdf Header line: returnStatus = 0 Read 8192 from document Read 8192 from document Read 8192 from document Read 3875 from document Read a total of 28451 bytes PDF::setContents(28451 bytes) PDF::parse(http://www.flyingfish.co.nz/members/callsheets/ACROBAT.PDF) PDF::parse: cannot find pdf parser /usr/local/bin/acroread size = 28451 pick: www.flyingfish.co.nz, # servers = 1 htmerge: Sorting... DB2 problem...: missing or empty key value specified Deleted, no excerpt: 0/http://www.flyingfish.co.nz/members/callsheets/ACROBAT.PDF ... I have read all the READMEs, DETAILs, FAQ and Mail Archives I can find, but I just can't see what the error is. [There are no spaces at the end of lines in the htdig.conf file and the max_doc_size is fine. ;-] Regards James Robertson Web Developer - Composite Design mailto:[EMAIL PROTECTED] +64-4-973 4555 _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

