Good, doc2html is now working correctly.
Htdig is only retrieving part of the file, and pdftotext needs the complete
document in order to extract text from it.
You need to set the max_doc_size attribute in your configuration file to a
value larger than the largest .pdf file you expect to index.
--
David Adams
Computing Services
Southampton University
----- Original Message -----
From: "adele zhou" <[EMAIL PROTECTED]>
To: "David Adams" <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>
Sent: Tuesday, July 03, 2001 3:24 AM
Subject: Re: [htdig] unable to parse pdf..
> I think I didn't write it in clear way.
> The Error message is
> [adele@localhost adele]$ /opt/www/htdig/bin/rundig
> !! Error (0): PDF file is damaged - attempting to
> reconstruct xref table...!! Error: Couldn't find
> trailer dictionary
> !! Error: Couldn't read xref table
> Thanks
> Adele
>
> __________________________________________________
> Do You Yahoo!?
> Get personalized email addresses from Yahoo! Mail
> http://personal.mail.yahoo.com/
>
> _______________________________________________
> htdig-general mailing list <[EMAIL PROTECTED]>
> To unsubscribe, send a message to
<[EMAIL PROTECTED]> with a subject of unsubscribe
> FAQ: http://htdig.sourceforge.net/FAQ.html
>
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html