PDF Box can't parse document: java.lang.NullPointerException
------------------------------------------------------------
Key: NUTCH-220
URL: http://issues.apache.org/jira/browse/NUTCH-220
Project: Nutch
Type: Bug
Environment: PDFBox 0.7.2
Reporter: Richard Braman
This error was fixed in the ltest build of PDFBOx, which should be tested with
nutch.
>> 060228 160354 fetch okay, but can't parse
>> http://www.mstc.state.ms.us/info/stats/transfer/tran0704.pdf, reason:
>> failed(2,0): Can't be handled as pdf document.
>> java.lang.NullPointerException
Yes, the NPE should be fixed.
Ben
Richard Braman wrote:
> Hi Bn,
>
> We actually got to the bottom of all of them except for 1... The
> content truncatetion was due to an inconsistancy bug in nutch config .
> The no permission to extract text is actually true, the author, the NC
> Department of revenue put this restriction on all of their files (I have
> asked them to remove it as it hampers public accessability). The Null
> pointer exception is the only one to deal with that may be due to the
> parsing bug . Is this one that you are referring to?
>
> -----Original Message-----
> From: Ben Litchfield [mailto:[EMAIL PROTECTED]
> Sent: Thursday, March 02, 2006 4:07 PM
> To: Richard Braman
> Cc: [email protected]; [email protected];
> [EMAIL PROTECTED]
> Subject: Re: [PDFBox-user] PDF Parse Error
>
>
>
> I believe these errors are due to a parsing bug in PDFBox that has
> been fixed since the 0.7.2 release. Please give the nightly
> build(should be a drop in replacement) a try from
> http://www.pdfbox.org/dist and let me know if you are still having
> issues.
>
> Ben
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira