Hello htdig 3.1.5 on solaris 2.6
It works for some pdf files like http://public.archi.fr/PubliMIARA/Communique_33.pdf
and not for some pdf files like http://public.archi.fr/PubliMIARA/AthisMons.pdf
So, I guess my configuration and soft are ok : external_parsers: application/pdf /usr/local/bin/parsepdf.pl
and this file is modified to fit the correct path : $parser = "/usr/local/bin/pdftotext"; $info = "/usr/local/bin/pdfinfo";
these 2 files come from xpdf for solaris
When it works i get the log
1:0:http://public.archi.fr/PubliMIARA/Communique_33.pdf
New server: public.archi.fr, 80
Retrieval command for http://public.archi.fr/robots.txt: GET /robots.txt HTTP/1.0
User-Agent: htdig/3.1.5 ([EMAIL PROTECTED])
Host: public.archi.fr
Header line: HTTP/1.1 404 Not Found
Header line: Date: Mon, 24 Mar 2003 14:15:54 GMT
Header line: Server: Apache/1.3.26 (Unix) PHP/4.2.2 mod_ssl/2.8.10 OpenSSL/0.9.6e
Header line: Connection: close
Header line: Content-Type: text/html; charset=iso-8859-1
Header line:
returnStatus = 1
pushed
pick: public.archi.fr, # servers = 1
0:0:0:http://public.archi.fr/PubliMIARA/Communique_33.pdf: Retrieval command for http://public.arc
hi.fr/PubliMIARA/Communique_33.pdf: GET /PubliMIARA/Communique_33.pdf HTTP/1.0
User-Agent: htdig/3.1.5 ([EMAIL PROTECTED])
Host: public.archi.fr
Header line: HTTP/1.1 200 OK
Header line: Date: Mon, 24 Mar 2003 14:15:54 GMT
Header line: Server: Apache/1.3.26 (Unix) PHP/4.2.2 mod_ssl/2.8.10 OpenSSL/0.9.6e
Header line: Last-Modified: Fri, 07 Sep 2001 07:39:14 GMT
Translated Fri, 07 Sep 2001 07:39:14 GMT to 2001-09-07 07:39:14 (101)
And converted to Fri, 07 Sep 2001 07:39:14
Header line: ETag: "16bd12-72f1-3b9879a2"
Header line: Accept-Ranges: bytes
Header line: Content-Length: 29425
Header line: Connection: close
Header line: Content-Type: application/pdf
Header line:
returnStatus = 0
Read 8192 from document
Read 8192 from document
Read 8192 from document
Read a total of 29425 bytes
title: Document PDF Communique_33.pdf word: [EMAIL PROTECTED] word: [EMAIL PROTECTED] word: [EMAIL PROTECTED] word: [EMAIL PROTECTED] ... size = 29425 pick: public.archi.fr, # servers = 1
When it doesn't work, I get almost the same except the lines with word: ....
Does it come from the pdf or do I have to set properly something else. Any suggestion would be appreciated. Thanks
------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

