>
>
> Hi all. I'm trying to index PDF and postscript files, and keep receiving:
>
> Error (0): PDF file is damaged - attempting to reconstruct xref table...
> Error: Couldn't find trailer dictionary
> Error: Couldn't read xref table
>
> or
>
> Error (0): PDF file is damaged - attempting to reconstruct xref table...
> Error: Top-level pages object is wrong type (null)
> Error: Couldn't read page catalog
>
> on a handful of the dozens of PDF files in the directory I'm trying to
> index. I'm using ghostscript-6.01 and have run pdf2ps on the individual
> files in the directory with no problem.
>
> I'm using the external_parsers paramter with the parse_doc shell script
> combined with xpdf. Should I try using acroread instead?
>
> Thanks,
> Dave Wreski
>
>
Most likely these particular PDF files are larger than the max_doc_size
attribute you have set in your configuration file. Htdig is only
fetching part of the file and passing this to the conversion utility.
Truncated PDF files cannot be parsed.
You should also consider switching from parse_doc to doc2html, a new
version of which should be available to download shortly.
--
David J Adams
<[EMAIL PROTECTED]>
Computing Services
University of Southampton
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
List archives: <http://www.htdig.org/mail/menu.html>
FAQ: <http://www.htdig.org/FAQ.html>