"David Adams" <[EMAIL PROTECTED]> writes: > Your external_parsers: statement looks OK to me, but it seems that htdig is > ignoring it. > > Do you get any error messages from htdig which might suggest a problem? > Version 3.2 will ignore all attributes following an error in the > configuration file.
I'm using 3.1.6. But I do get an error message: # rundig -vvv > /tmp/htlogg # DB2 problem...: missing or empty key value specified What does this mean? > > Check also that you have only one external_parsers: statement in your > configuration file, that the line immediately before external_parsers: is > correct and doesn't end in a backslash, and that you are using the correct > configuration file. I was planning to include the htdig.conf-file minus comments here. But tried the stripped conf-file first and wonders over all wonders it worked!!! YES :) Thank you Adam, this made my day :) Vidar > > David Adams > Corporate Information Services > Information Systems Services > University of Southampton > > ----- Original Message ----- > From: "Vidar Ringstrom" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Sent: Tuesday, June 10, 2003 12:21 PM > Subject: [htdig] PDF::parse: cannot open acroread output > > > > > > What am I missing? > > > > doc2html.pl, pdfinfo and pdftotext works from the commandline. > > > > Some variables from htdig.conf: > > > > start_url: > http://www.tfb.no/db/adresseboktrondheim/1905/3_7_20030528_134812.pdf > > max_head_length: 10000 > > max_doc_size: 15000000 > > external_parsers: application/pdf->text/html /usr/local/bin/doc2html.pl > > > > > > Output from: htdig -vvv > > > > 1:1:http://www.tfb.no/db/adresseboktrondheim/1905/3_7_20030528_134812.pdf > > New server: www.tfb.no, 80 > > Retrieval command for http://www.tfb.no/robots.txt: GET /robots.txt > HTTP/1.0 > > User-Agent: htdig/3.1.6 ([EMAIL PROTECTED]) > > Authorization: Basic aHRkaWdyb2JvdDplbmhldDg1MA== > > Host: www.tfb.no > > > > Header line: HTTP/1.1 200 OK > > Header line: Date: Tue, 10 Jun 2003 11:09:56 GMT > > Header line: Server: Apache/1.3.27 (Unix) PHP/4.3.1 FrontPage/4.0.4.3 > mod_ssl/2. > > 8.14 OpenSSL/0.9.7a > > Header line: Last-Modified: Fri, 25 Oct 2002 07:28:28 GMT > > Converted Fri, 25 Oct 2002 07:28:28 GMT to Fri, 25 Oct 2002 07:28:28 > > Header line: ETag: "35f514-32-3db8f29c" > > Header line: Accept-Ranges: bytes > > Header line: Content-Length: 50 > > Header line: Connection: close > > Header line: Content-Type: text/plain > > Header line: > > returnStatus = 0 > > Read 50 from document > > Read a total of 50 bytes > > Parsing robots.txt file using myname = htdig > > Robots.txt line: # robots.txt > > Robots.txt line: # > > Robots.txt line: User-agent: * > > Found 'user-agent' line: * > > Robots.txt line: Disallow: /cgi-bin/ > > Found 'disallow' line: /cgi-bin/ > > Pattern: /cgi-bin/ > > pushed > > pick: www.tfb.no, # servers = 1 > > > 0:0:0:http://www.tfb.no/db/adresseboktrondheim/1905/3_7_20030528_134812.pdf: > Ret > > rieval command for > http://www.tfb.no/db/adresseboktrondheim/1905/3_7_20030528_13 > > 4812.pdf: GET /db/adresseboktrondheim/1905/3_7_20030528_134812.pdf > HTTP/1.0 > > User-Agent: htdig/3.1.6 ([EMAIL PROTECTED]) > > Authorization: Basic aHRkaWdyb2JvdDplbmhldDg1MA== > > Host: www.tfb.no > > > > Header line: HTTP/1.1 200 OK > > Header line: Date: Tue, 10 Jun 2003 11:09:56 GMT > > Header line: Server: Apache/1.3.27 (Unix) PHP/4.3.1 FrontPage/4.0.4.3 > mod_ssl/2. > > 8.14 OpenSSL/0.9.7a > > Header line: Last-Modified: Wed, 28 May 2003 10:51:51 GMT > > Converted Wed, 28 May 2003 10:51:51 GMT to Wed, 28 May 2003 10:51:51 > > Header line: ETag: "2d7886-ef80b-3ed494c7" > > Header line: Accept-Ranges: bytes > > Header line: Content-Length: 981003 > > Header line: Connection: close > > Header line: Content-Type: application/pdf > > Header line: > > returnStatus = 0 > > Read 8192 from document > > > > -removed several more lines with "Read 8192 from document" > > > > Read 6155 from document > > Read a total of 981003 bytes > > PDF::setContents(981003 bytes) > > > PDF::parse(http://www.tfb.no/db/adresseboktrondheim/1905/3_7_20030528_134812 > .pdf > > ) > > PDF::parse: cannot open acroread output from > http://www.tfb.no/db/adresseboktron > > dheim/1905/3_7_20030528_134812.pdf > > size = 981003 > > pick: www.tfb.no, # servers = 1 > > > > > > > > > > Hilsen Vidar > > > > -- > > Vidar Ringstr�m Telefon 33 11 68 00 > > Bibliotek-Systemer As Fax 33 11 68 22 > > Boks 2093, Stubber�d, 3255 Larvik > > > > > > ------------------------------------------------------- > > This SF.net email is sponsored by: Etnus, makers of TotalView, The best > > thread debugger on the planet. Designed with thread debugging features > > you've never dreamed of, try TotalView 6 free at www.etnus.com. > > _______________________________________________ > > htdig-general mailing list <[EMAIL PROTECTED]> > > To unsubscribe, send a message to > <[EMAIL PROTECTED]> with a subject of unsubscribe > > FAQ: http://htdig.sourceforge.net/FAQ.html > > > > -- Hilsen Vidar -- Vidar Ringstr�m Telefon 33 11 68 00 Bibliotek-Systemer As Fax 33 11 68 22 Boks 2093, Stubber�d, 3255 Larvik ------------------------------------------------------- This SF.net email is sponsored by: Etnus, makers of TotalView, The best thread debugger on the planet. Designed with thread debugging features you've never dreamed of, try TotalView 6 free at www.etnus.com. _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

