PDF Box can't parse document: java.lang.NullPointerException
------------------------------------------------------------

         Key: NUTCH-220
         URL: http://issues.apache.org/jira/browse/NUTCH-220
     Project: Nutch
        Type: Bug
 Environment: PDFBox 0.7.2
    Reporter: Richard Braman


This error was fixed in the ltest build of PDFBOx, which should be tested with 
nutch.

>> 060228 160354 fetch okay, but can't parse
>> http://www.mstc.state.ms.us/info/stats/transfer/tran0704.pdf, reason:
>> failed(2,0): Can't be handled as pdf document. 
>> java.lang.NullPointerException

Yes, the NPE should be fixed.

 Ben

Richard Braman wrote:
> Hi Bn,
>
> We actually got to the bottom of all of them except for 1... The 
> content truncatetion was due to an inconsistancy bug in nutch config .
> The no permission to extract text is actually true, the author, the NC
> Department of revenue put this restriction on all of their files (I have
> asked them to remove it as it hampers public accessability).  The Null
> pointer exception is the only one to deal with that may be due to the
> parsing bug .  Is this one that you are referring to?
>
> -----Original Message-----
> From: Ben Litchfield [mailto:[EMAIL PROTECTED]
> Sent: Thursday, March 02, 2006 4:07 PM
> To: Richard Braman
> Cc: [email protected]; [email protected];
> [EMAIL PROTECTED]
> Subject: Re: [PDFBox-user] PDF Parse Error
>
>
>
> I believe these errors are due to a parsing bug in PDFBox that has 
> been fixed since the 0.7.2 release.  Please give the nightly 
> build(should be a drop in replacement) a try from 
> http://www.pdfbox.org/dist and let me know if you are still having 
> issues.
>
> Ben

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to