What am I missing? doc2html.pl, pdfinfo and pdftotext works from the commandline.
Some variables from htdig.conf: start_url: http://www.tfb.no/db/adresseboktrondheim/1905/3_7_20030528_134812.pdf max_head_length: 10000 max_doc_size: 15000000 external_parsers: application/pdf->text/html /usr/local/bin/doc2html.pl Output from: htdig -vvv 1:1:http://www.tfb.no/db/adresseboktrondheim/1905/3_7_20030528_134812.pdf New server: www.tfb.no, 80 Retrieval command for http://www.tfb.no/robots.txt: GET /robots.txt HTTP/1.0 User-Agent: htdig/3.1.6 ([EMAIL PROTECTED]) Authorization: Basic aHRkaWdyb2JvdDplbmhldDg1MA== Host: www.tfb.no Header line: HTTP/1.1 200 OK Header line: Date: Tue, 10 Jun 2003 11:09:56 GMT Header line: Server: Apache/1.3.27 (Unix) PHP/4.3.1 FrontPage/4.0.4.3 mod_ssl/2. 8.14 OpenSSL/0.9.7a Header line: Last-Modified: Fri, 25 Oct 2002 07:28:28 GMT Converted Fri, 25 Oct 2002 07:28:28 GMT to Fri, 25 Oct 2002 07:28:28 Header line: ETag: "35f514-32-3db8f29c" Header line: Accept-Ranges: bytes Header line: Content-Length: 50 Header line: Connection: close Header line: Content-Type: text/plain Header line: returnStatus = 0 Read 50 from document Read a total of 50 bytes Parsing robots.txt file using myname = htdig Robots.txt line: # robots.txt Robots.txt line: # Robots.txt line: User-agent: * Found 'user-agent' line: * Robots.txt line: Disallow: /cgi-bin/ Found 'disallow' line: /cgi-bin/ Pattern: /cgi-bin/ pushed pick: www.tfb.no, # servers = 1 0:0:0:http://www.tfb.no/db/adresseboktrondheim/1905/3_7_20030528_134812.pdf: Ret rieval command for http://www.tfb.no/db/adresseboktrondheim/1905/3_7_20030528_13 4812.pdf: GET /db/adresseboktrondheim/1905/3_7_20030528_134812.pdf HTTP/1.0 User-Agent: htdig/3.1.6 ([EMAIL PROTECTED]) Authorization: Basic aHRkaWdyb2JvdDplbmhldDg1MA== Host: www.tfb.no Header line: HTTP/1.1 200 OK Header line: Date: Tue, 10 Jun 2003 11:09:56 GMT Header line: Server: Apache/1.3.27 (Unix) PHP/4.3.1 FrontPage/4.0.4.3 mod_ssl/2. 8.14 OpenSSL/0.9.7a Header line: Last-Modified: Wed, 28 May 2003 10:51:51 GMT Converted Wed, 28 May 2003 10:51:51 GMT to Wed, 28 May 2003 10:51:51 Header line: ETag: "2d7886-ef80b-3ed494c7" Header line: Accept-Ranges: bytes Header line: Content-Length: 981003 Header line: Connection: close Header line: Content-Type: application/pdf Header line: returnStatus = 0 Read 8192 from document -removed several more lines with "Read 8192 from document" Read 6155 from document Read a total of 981003 bytes PDF::setContents(981003 bytes) PDF::parse(http://www.tfb.no/db/adresseboktrondheim/1905/3_7_20030528_134812.pdf ) PDF::parse: cannot open acroread output from http://www.tfb.no/db/adresseboktron dheim/1905/3_7_20030528_134812.pdf size = 981003 pick: www.tfb.no, # servers = 1 Hilsen Vidar -- Vidar Ringstr�m Telefon 33 11 68 00 Bibliotek-Systemer As Fax 33 11 68 22 Boks 2093, Stubber�d, 3255 Larvik ------------------------------------------------------- This SF.net email is sponsored by: Etnus, makers of TotalView, The best thread debugger on the planet. Designed with thread debugging features you've never dreamed of, try TotalView 6 free at www.etnus.com. _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

