I forgot to mention, the largest PDF document is 29Mb, I have set max_doc_size to 50Mb. The core file is indeed from pdftotext. Any ideas on how to ignore these files.
>From: Gilles Detillieux <[EMAIL PROTECTED]> >To: [EMAIL PROTECTED] (garsila Ndzmande) >CC: [EMAIL PROTECTED] (ht://Dig mailing list) >Subject: Re: [htdig] rundig fails during PDF indexing >Date: Fri, 25 Jan 2002 15:33:46 -0600 (CST) > >According to garsila Ndzmande: > > How can I prevent rundig from crashing and dumping core, due to Bad or > > protected PDF files? > > > > Error (0): PDF file is damaged - attempting to reconstruct xref table... > > Error: Kid object (page 4) is wrong type (null) > > Error: Page count in top-level pages object is incorrect > > Error: Couldn't read page catalog > > Error: Unknown Type 0 character set: Adobe-Identity > > Error (0): PDF file is damaged - attempting to reconstruct xref table... > > Error: Dictionary key must be a name object > > Error: End of file inside dictionary > > Error: End of file inside dictionary > > Error: font resource is not a dictionary > > Error: Weird page contents > > Error (49721): Illegal character ')' > >Set you max_doc_size appropriately. See http://www.htdig.org/FAQ.html#q5.2 > >This problem shouldn't cause htdig to dump core, although perhaps pdftotext >might. > >-- >Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> >Spinal Cord Research Centre WWW: >http://www.scrc.umanitoba.ca/~grdetil >Dept. Physiology, U. of Manitoba Phone: (204)789-3766 >Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 _________________________________________________________________ MSN Photos is the easiest way to share and print your photos: http://photos.msn.com/support/worldwide.aspx _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

