[ https://issues.apache.org/jira/browse/NUTCH-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579732#action_12579732 ]
Hudson commented on NUTCH-220: ------------------------------ Integrated in Nutch-trunk #393 (See [http://hudson.zones.apache.org/hudson/job/Nutch-trunk/393/]) > PDF Box can't parse document: java.lang.NullPointerException > ------------------------------------------------------------ > > Key: NUTCH-220 > URL: https://issues.apache.org/jira/browse/NUTCH-220 > Project: Nutch > Issue Type: Bug > Environment: PDFBox 0.7.2 > Reporter: Richard Braman > Assignee: Andrzej Bialecki > Fix For: 1.0.0 > > > This error was fixed in the ltest build of PDFBOx, which should be tested > with nutch. > >> 060228 160354 fetch okay, but can't parse > >> http://www.mstc.state.ms.us/info/stats/transfer/tran0704.pdf, reason: > >> failed(2,0): Can't be handled as pdf document. > >> java.lang.NullPointerException > Yes, the NPE should be fixed. > Ben > Richard Braman wrote: > > Hi Bn, > > > > We actually got to the bottom of all of them except for 1... The > > content truncatetion was due to an inconsistancy bug in nutch config . > > The no permission to extract text is actually true, the author, the NC > > Department of revenue put this restriction on all of their files (I have > > asked them to remove it as it hampers public accessability). The Null > > pointer exception is the only one to deal with that may be due to the > > parsing bug . Is this one that you are referring to? > > > > -----Original Message----- > > From: Ben Litchfield [mailto:[EMAIL PROTECTED] > > Sent: Thursday, March 02, 2006 4:07 PM > > To: Richard Braman > > Cc: nutch-dev@lucene.apache.org; [EMAIL PROTECTED]; > > [EMAIL PROTECTED] > > Subject: Re: [PDFBox-user] PDF Parse Error > > > > > > > > I believe these errors are due to a parsing bug in PDFBox that has > > been fixed since the 0.7.2 release. Please give the nightly > > build(should be a drop in replacement) a try from > > http://www.pdfbox.org/dist and let me know if you are still having > > issues. > > > > Ben -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.