The "Deleted, no excerpt:" message may be due to pdf2html.pl failing to work, or there being no text that can be extracted from the .pdf file.
Have you tried executing pdf2html.pl from the command line to see what output you get with these .pdf files? David Adams Corporate Information Services Information Systems Services University of Southampton ----- Original Message ----- From: <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, September 01, 2004 8:38 AM Subject: [htdig] Coredump when indexing pdf files with Htdig 3.16 > Hi, > > On HP Unix 9000, i use htdig 3.16 for indexing our intranet's site which was > develloping with coldfusion and sybase. > All is ok. > > Now, i try to indexing the same site (or a small part of it) including pdf > files . > I have make the modifications in htdig.conf and in the different files of > parameters.(pdf2html.pl...) > > The beginning of indexation with pdf files seems ok like this : > --------------------extract of logfile with -vvv------------------------ > pick: interligne.xxxx.fr, # servers = 1 > 4:4:1:http://interligne.xxxx.fr/5/5.1/OrientationsComInterne.pdf: Retrieval > command for http://interligne.xxxx.fr/5/5.1/OrientationsComInterne.pdf: > GET /5/5.1/OrientationsComInterne.pdf HTTP/1.0 > User-Agent: htdig/3.1.6 ([EMAIL PROTECTED]) > Referer: http://interligne.xxxx.fr/5/5.1/ > Host: interligne.xxxx.fr > > Header line: HTTP/1.1 200 OK > Header line: Date: Wed, 01 Sep 2004 06:06:49 GMT > Header line: Server: Apache/1.3.19 (Unix) PHP/4.0.5 > Header line: Last-Modified: Tue, 15 Jun 2004 06:31:01 GMT > Converted Tue, 15 Jun 2004 06:31:01 GMT to Tue, 15 Jun 2004 06:31:01 > Header line: ETag: "4741-13bdf-40ce97a5" > Header line: Accept-Ranges: bytes > Header line: Content-Length: 80863 > Header line: Connection: close > Header line: Content-Type: application/pdf > Header line: > returnStatus = 0 > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 8192 from document > Read 7135 from document > Read a total of 80863 bytes > size = 80863 > ---------------end of extract---------------------- > > but after some time i have this : > Deleted, no excerpt: 0/http://interligne.xxxx.fr/5/5.1 > and i have the same message with ALL the files after. > > On the console i have this : > DB2 problem.... : PANIC: Invalid argument > /opt/www/htdig/bin/rundig[36]: 8043 Memory fault(coredump) > > and later four time : > DB2 problem...: missing or empty key value specified. > > > I think i have a mistake with temporary files but i don't find the param... > > Some one have a idea ?? > > Thanks. > > Hervé > > > ------------------------------------------------------- > This SF.Net email is sponsored by BEA Weblogic Workshop > FREE Java Enterprise J2EE developer tools! > Get your free copy of BEA WebLogic Workshop 8.1 today. > http://ads.osdn.com/?ad_idP47&alloc_id808&opĚk > _______________________________________________ > ht://Dig general mailing list: <[EMAIL PROTECTED]> > ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html > List information (subscribe/unsubscribe, etc.) > https://lists.sourceforge.net/lists/listinfo/htdig-general > > ------------------------------------------------------- This SF.Net email is sponsored by BEA Weblogic Workshop FREE Java Enterprise J2EE developer tools! Get your free copy of BEA WebLogic Workshop 8.1 today. http://ads.osdn.com/?ad_idP47&alloc_id808&op=click _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general