[ 
https://issues.apache.org/jira/browse/NUTCH-220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  closed NUTCH-220.
-----------------------------------

       Resolution: Fixed
    Fix Version/s: 1.0.0
         Assignee: Andrzej Bialecki 

> PDF Box can't parse document: java.lang.NullPointerException
> ------------------------------------------------------------
>
>                 Key: NUTCH-220
>                 URL: https://issues.apache.org/jira/browse/NUTCH-220
>             Project: Nutch
>          Issue Type: Bug
>         Environment: PDFBox 0.7.2
>            Reporter: Richard Braman
>            Assignee: Andrzej Bialecki 
>             Fix For: 1.0.0
>
>
> This error was fixed in the ltest build of PDFBOx, which should be tested 
> with nutch.
> >> 060228 160354 fetch okay, but can't parse
> >> http://www.mstc.state.ms.us/info/stats/transfer/tran0704.pdf, reason:
> >> failed(2,0): Can't be handled as pdf document. 
> >> java.lang.NullPointerException
> Yes, the NPE should be fixed.
>  Ben
> Richard Braman wrote:
> > Hi Bn,
> >
> > We actually got to the bottom of all of them except for 1... The 
> > content truncatetion was due to an inconsistancy bug in nutch config .
> > The no permission to extract text is actually true, the author, the NC
> > Department of revenue put this restriction on all of their files (I have
> > asked them to remove it as it hampers public accessability).  The Null
> > pointer exception is the only one to deal with that may be due to the
> > parsing bug .  Is this one that you are referring to?
> >
> > -----Original Message-----
> > From: Ben Litchfield [mailto:[EMAIL PROTECTED]
> > Sent: Thursday, March 02, 2006 4:07 PM
> > To: Richard Braman
> > Cc: nutch-dev@lucene.apache.org; [EMAIL PROTECTED];
> > [EMAIL PROTECTED]
> > Subject: Re: [PDFBox-user] PDF Parse Error
> >
> >
> >
> > I believe these errors are due to a parsing bug in PDFBox that has 
> > been fixed since the 0.7.2 release.  Please give the nightly 
> > build(should be a drop in replacement) a try from 
> > http://www.pdfbox.org/dist and let me know if you are still having 
> > issues.
> >
> > Ben

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to