Re: [htdig] Coredump when indexing pdf files with Htdig 3.16

David Adams Wed, 01 Sep 2004 01:53:03 -0700

The "Deleted, no excerpt:" message may be due to pdf2html.pl failing to
work, or there being no text that can be extracted from the .pdf file.


Have you tried executing pdf2html.pl from the command line to see what
output you get with these .pdf files?

David Adams
Corporate Information Services
Information Systems Services
University of Southampton

----- Original Message ----- 
From: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, September 01, 2004 8:38 AM
Subject: [htdig] Coredump when indexing pdf files with Htdig 3.16


> Hi,
>
> On HP Unix 9000, i use htdig 3.16 for indexing our intranet's site which
was
> develloping with coldfusion and sybase.
> All is ok.
>
> Now, i try to indexing the same site (or a small part of it) including pdf
> files .
> I have make the modifications in htdig.conf and in the different files of
> parameters.(pdf2html.pl...)
>
> The beginning of indexation with pdf files seems ok like this :
> --------------------extract of logfile with -vvv------------------------
> pick: interligne.xxxx.fr, # servers = 1
> 4:4:1:http://interligne.xxxx.fr/5/5.1/OrientationsComInterne.pdf:
Retrieval
> command for http://interligne.xxxx.fr/5/5.1/OrientationsComInterne.pdf:
> GET /5/5.1/OrientationsComInterne.pdf HTTP/1.0
> User-Agent: htdig/3.1.6 ([EMAIL PROTECTED])
> Referer: http://interligne.xxxx.fr/5/5.1/
> Host: interligne.xxxx.fr
>
> Header line: HTTP/1.1 200 OK
> Header line: Date: Wed, 01 Sep 2004 06:06:49 GMT
> Header line: Server: Apache/1.3.19 (Unix) PHP/4.0.5
> Header line: Last-Modified: Tue, 15 Jun 2004 06:31:01 GMT
> Converted Tue, 15 Jun 2004 06:31:01 GMT to Tue, 15 Jun 2004 06:31:01
> Header line: ETag: "4741-13bdf-40ce97a5"
> Header line: Accept-Ranges: bytes
> Header line: Content-Length: 80863
> Header line: Connection: close
> Header line: Content-Type: application/pdf
> Header line:
> returnStatus = 0
> Read 8192 from document
> Read 8192 from document
> Read 8192 from document
> Read 8192 from document
> Read 8192 from document
> Read 8192 from document
> Read 8192 from document
> Read 8192 from document
> Read 8192 from document
> Read 7135 from document
> Read a total of 80863 bytes
>  size = 80863
> ---------------end of extract----------------------
>
> but after some time i have this :
> Deleted, no excerpt: 0/http://interligne.xxxx.fr/5/5.1
> and i have the same message with ALL the files after.
>
> On the console i have this :
> DB2 problem.... : PANIC: Invalid argument
> /opt/www/htdig/bin/rundig[36]: 8043 Memory fault(coredump)
>
> and later four time :
> DB2 problem...: missing or empty key value specified.
>
>
> I think i have a mistake with temporary files but i don't find the
param...
>
> Some one have a idea ??
>
> Thanks.
>
> Herv�
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by BEA Weblogic Workshop
> FREE Java Enterprise J2EE developer tools!
> Get your free copy of BEA WebLogic Workshop 8.1 today.
> http://ads.osdn.com/?ad_idP47&alloc_id808&op�k
> _______________________________________________
> ht://Dig general mailing list: <[EMAIL PROTECTED]>
> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
> List information (subscribe/unsubscribe, etc.)
> https://lists.sourceforge.net/lists/listinfo/htdig-general
>
>



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_idP47&alloc_id808&op=click
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Re: [htdig] Coredump when indexing pdf files with Htdig 3.16

Reply via email to