Reinhard,

The root element in your PDF references object 1554 as the object which 
informs us of the pages within this document.  This object does not seem 
to exist in the PDF, which is a violation of the PDF spec and why PDFBox 
is unable to parse it.  You can open the PDF in a decent text editor and 
search for 1554 and you'll see the Pages section which references this 
object, but that's the only place it's found, there's no object 
definition.

Now, having said that, if we can find a reliable way to parse files like 
these, we can update the code.  Do you know what program was used to 
create this PDF?  Would it be possible for you to remove the encryption on 
this file and try it again?  That would make it much easier to debug (if 
it still crashes without the encryption, it might not).

I also encourage you to create an issue of JIRA and upload this file there 
(in case the link dies in the future).  https://issues.apache.org/jira

---- 
Thanks,
Adam





From:
reinhard schwab <[email protected]>
To:
[email protected]
Date:
08/21/2010 11:42
Subject:
NPE in PDPageNode



i get a nullpointer exception when parsing a pdf with tika.

http://www.awsg.at/portal/media/4218.pdf

java.lang.NullPointerException
    at org.apache.pdfbox.pdmodel.PDPageNode.getCount(PDPageNode.java:109)
    at
org.apache.pdfbox.pdmodel.PDDocument.getNumberOfPages(PDDocument.java:943)
    at
org.apache.tika.parser.pdf.PDFParser.extractMetadata(PDFParser.java:105)
    at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:86)


regards
reinhard






?  Click here to submit conditions  

This email and any content within or attached hereto from  Sun West Mortgage 
Company, Inc.  is confidential and/or legally privileged. The information is 
intended only for the use of the individual or entity named on this email. If 
you are not the intended recipient, you are hereby notified that any 
disclosure, copying, distribution or the taking of any action in reliance on 
the contents of this email information is strictly prohibited, and that the 
documents should be returned to this office immediately by email. Receipt by 
anyone other than the intended recipient is not a waiver of any privilege. 
Please do not include your social security number, account number, or any other 
personal or financial information in the content of the email. Should you have 
any questions, please call  (800) 453 7884.   

Reply via email to