[ https://issues.apache.org/jira/browse/PDFBOX-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Timo Boehme resolved PDFBOX-1331. --------------------------------- Resolution: Duplicate Fix Version/s: (was: 1.7.0) 1.8.0 Ok, this issue has more interesting aspects as I though first (thanks for insisting on it). First it is true that the document is broken. Some readers will hide error messages to the user, but xpdf for instance will show them (you can test it yourself e.g. with object 249 where the length is some bytes too large). Since NonSeqPDFParser currently does not try to repair such documents it will fail. For the standard parser 1.7.0 introduced an improvement when parsing streams by using length value. Now if this value is wrong it fails too. The solution would be to fall back to old (scanning) stream parsing in such cases. I have created a dedicated issue for this improvement with PDFBOX-1333. There is another (independent) isssue with PDFStreamEngine.getFonts if no fonts are defined. I will track this in a new issue. resolved as duplicate of PDFBOX-1333; using this patch allows parsing document with standard parser > Can't load any text when font is null > ------------------------------------- > > Key: PDFBOX-1331 > URL: https://issues.apache.org/jira/browse/PDFBOX-1331 > Project: PDFBox > Issue Type: Bug > Components: PDModel > Affects Versions: 1.7.0, 1.8.0 > Environment: JDK 1.6 64bit > Reporter: philip huang > Assignee: Timo Boehme > Fix For: 1.8.0 > > Attachments: 19472133.PDF > > > Open 19472133.PDF PdfboxReader without "-nonSeq" parameter. > Turn to page 3, many NullPointerExceptions are displayed, and pdfviewer can't > show any text. > java.lang.NullPointerException > at > org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:366) > at > org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.java:62) > at > org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:556) > at > org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:270) > at > org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:246) > at > org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:217) > at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:119) > at org.apache.pdfbox.pdfviewer.PDFPagePanel.paint(PDFPagePanel.java:98) > java.util.EmptyStackException > at java.util.Stack.peek(Stack.java:85) > at > org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:601) > at > org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54) > at > org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:556) > at > org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:270) > at > org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:246) > at > org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:217) > at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:119) > at org.apache.pdfbox.pdfviewer.PDFPagePanel.paint(PDFPagePanel.java:98) > at javax.swing.JComponent.paintChildren(JComponent.java:862) > Open document with "-nonSeq" parameter > Exception in thread "main" java.io.IOException: Error reading stream using > length value. Expected='endstream' actual='' > at > org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseCOSStream(NonSequentialPDFParser.java:1327) > at > org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:1032) > at > org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:955) > at > org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseDictObjects(NonSequentialPDFParser.java:929) > at > org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:337) > at > org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:574) > at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1124) > at org.apache.pdfbox.PDFReader.parseDocument(PDFReader.java:378) > at org.apache.pdfbox.PDFReader.openPDFFile(PDFReader.java:319) > at org.apache.pdfbox.PDFReader.main(PDFReader.java:305) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira