What am I missing?

doc2html.pl, pdfinfo and pdftotext works from the commandline.

Some variables from htdig.conf:

start_url: http://www.tfb.no/db/adresseboktrondheim/1905/3_7_20030528_134812.pdf
max_head_length:        10000
max_doc_size:           15000000
external_parsers: application/pdf->text/html /usr/local/bin/doc2html.pl


Output from:  htdig -vvv

1:1:http://www.tfb.no/db/adresseboktrondheim/1905/3_7_20030528_134812.pdf
New server: www.tfb.no, 80
Retrieval command for http://www.tfb.no/robots.txt: GET /robots.txt HTTP/1.0
User-Agent: htdig/3.1.6 ([EMAIL PROTECTED])
Authorization: Basic aHRkaWdyb2JvdDplbmhldDg1MA==
Host: www.tfb.no

Header line: HTTP/1.1 200 OK
Header line: Date: Tue, 10 Jun 2003 11:09:56 GMT
Header line: Server: Apache/1.3.27 (Unix) PHP/4.3.1 FrontPage/4.0.4.3 mod_ssl/2.
8.14 OpenSSL/0.9.7a
Header line: Last-Modified: Fri, 25 Oct 2002 07:28:28 GMT
Converted Fri, 25 Oct 2002 07:28:28 GMT to Fri, 25 Oct 2002 07:28:28
Header line: ETag: "35f514-32-3db8f29c"
Header line: Accept-Ranges: bytes
Header line: Content-Length: 50
Header line: Connection: close
Header line: Content-Type: text/plain
Header line: 
returnStatus = 0
Read 50 from document
Read a total of 50 bytes
Parsing robots.txt file using myname = htdig
Robots.txt line: # robots.txt
Robots.txt line: #
Robots.txt line: User-agent: *
Found 'user-agent' line: *
Robots.txt line: Disallow: /cgi-bin/
Found 'disallow' line: /cgi-bin/
Pattern: /cgi-bin/
 pushed
pick: www.tfb.no, # servers = 1
0:0:0:http://www.tfb.no/db/adresseboktrondheim/1905/3_7_20030528_134812.pdf: Ret
rieval command for http://www.tfb.no/db/adresseboktrondheim/1905/3_7_20030528_13
4812.pdf: GET /db/adresseboktrondheim/1905/3_7_20030528_134812.pdf HTTP/1.0
User-Agent: htdig/3.1.6 ([EMAIL PROTECTED])
Authorization: Basic aHRkaWdyb2JvdDplbmhldDg1MA==
Host: www.tfb.no

Header line: HTTP/1.1 200 OK
Header line: Date: Tue, 10 Jun 2003 11:09:56 GMT
Header line: Server: Apache/1.3.27 (Unix) PHP/4.3.1 FrontPage/4.0.4.3 mod_ssl/2.
8.14 OpenSSL/0.9.7a
Header line: Last-Modified: Wed, 28 May 2003 10:51:51 GMT
Converted Wed, 28 May 2003 10:51:51 GMT to Wed, 28 May 2003 10:51:51
Header line: ETag: "2d7886-ef80b-3ed494c7"
Header line: Accept-Ranges: bytes
Header line: Content-Length: 981003
Header line: Connection: close
Header line: Content-Type: application/pdf
Header line: 
returnStatus = 0
Read 8192 from document

    -removed several more lines with "Read 8192 from document"

Read 6155 from document
Read a total of 981003 bytes
PDF::setContents(981003 bytes)
PDF::parse(http://www.tfb.no/db/adresseboktrondheim/1905/3_7_20030528_134812.pdf
)
PDF::parse: cannot open acroread output from http://www.tfb.no/db/adresseboktron
dheim/1905/3_7_20030528_134812.pdf
 size = 981003
pick: www.tfb.no, # servers = 1




Hilsen Vidar

--
Vidar Ringstr�m                         Telefon 33 11 68 00
Bibliotek-Systemer As           Fax 33 11 68 22
Boks 2093, Stubber�d, 3255 Larvik


-------------------------------------------------------
This SF.net email is sponsored by:  Etnus, makers of TotalView, The best
thread debugger on the planet. Designed with thread debugging features
you've never dreamed of, try TotalView 6 free at www.etnus.com.
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to