Hello
htdig 3.1.5 on solaris 2.6

It works for some pdf files like http://public.archi.fr/PubliMIARA/Communique_33.pdf
and not for some pdf files like http://public.archi.fr/PubliMIARA/AthisMons.pdf


So, I guess my configuration and soft are ok :
external_parsers:  application/pdf /usr/local/bin/parsepdf.pl

and this file is modified to fit the correct path :
$parser = "/usr/local/bin/pdftotext";
$info   = "/usr/local/bin/pdfinfo";

these 2 files come from xpdf for solaris

When it works i get the log
1:0:http://public.archi.fr/PubliMIARA/Communique_33.pdf
New server: public.archi.fr, 80
Retrieval command for http://public.archi.fr/robots.txt: GET /robots.txt HTTP/1.0
User-Agent: htdig/3.1.5 ([EMAIL PROTECTED])
Host: public.archi.fr


Header line: HTTP/1.1 404 Not Found
Header line: Date: Mon, 24 Mar 2003 14:15:54 GMT
Header line: Server: Apache/1.3.26 (Unix) PHP/4.2.2 mod_ssl/2.8.10 OpenSSL/0.9.6e
Header line: Connection: close
Header line: Content-Type: text/html; charset=iso-8859-1
Header line:
returnStatus = 1
pushed
pick: public.archi.fr, # servers = 1
0:0:0:http://public.archi.fr/PubliMIARA/Communique_33.pdf: Retrieval command for http://public.arc
hi.fr/PubliMIARA/Communique_33.pdf: GET /PubliMIARA/Communique_33.pdf HTTP/1.0
User-Agent: htdig/3.1.5 ([EMAIL PROTECTED])
Host: public.archi.fr


Header line: HTTP/1.1 200 OK
Header line: Date: Mon, 24 Mar 2003 14:15:54 GMT
Header line: Server: Apache/1.3.26 (Unix) PHP/4.2.2 mod_ssl/2.8.10 OpenSSL/0.9.6e
Header line: Last-Modified: Fri, 07 Sep 2001 07:39:14 GMT
Translated Fri, 07 Sep 2001 07:39:14 GMT to 2001-09-07 07:39:14 (101)
And converted to Fri, 07 Sep 2001 07:39:14
Header line: ETag: "16bd12-72f1-3b9879a2"
Header line: Accept-Ranges: bytes
Header line: Content-Length: 29425
Header line: Connection: close
Header line: Content-Type: application/pdf
Header line:
returnStatus = 0
Read 8192 from document
Read 8192 from document
Read 8192 from document
Read a total of 29425 bytes


title: Document PDF Communique_33.pdf
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
word: [EMAIL PROTECTED]
...
 size = 29425
pick: public.archi.fr, # servers = 1

When it doesn't work, I get almost the same except the lines with word: ....

Does it come from the pdf or do I have to set properly something else.
Any suggestion would be appreciated.
Thanks




------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to