[ 
https://issues.apache.org/jira/browse/PDFBOX-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timo Boehme resolved PDFBOX-1331.
---------------------------------

       Resolution: Duplicate
    Fix Version/s:     (was: 1.7.0)
                   1.8.0

Ok, this issue has more interesting aspects as I though first (thanks for 
insisting on it).

First it is true that the document is broken. Some readers will hide error 
messages to the user, but xpdf for instance will show them (you can test it 
yourself e.g. with object 249 where the length is some bytes too large). Since 
NonSeqPDFParser currently does not try to repair such documents it will fail.

For the standard parser 1.7.0 introduced an improvement when parsing streams by 
using length value. Now if this value is wrong it fails too. The solution would 
be to fall back to old (scanning) stream parsing in such cases. I have created 
a dedicated issue for this improvement with PDFBOX-1333.

There is another (independent) isssue with PDFStreamEngine.getFonts if no fonts 
are defined. I will track this in a new issue.

resolved as duplicate of PDFBOX-1333; using this patch allows parsing document 
with standard parser
                
> Can't load any text when font is null
> -------------------------------------
>
>                 Key: PDFBOX-1331
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1331
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.7.0, 1.8.0
>         Environment: JDK 1.6 64bit
>            Reporter: philip huang
>            Assignee: Timo Boehme
>             Fix For: 1.8.0
>
>         Attachments: 19472133.PDF
>
>
> Open 19472133.PDF PdfboxReader without "-nonSeq" parameter.
> Turn to page 3, many NullPointerExceptions are displayed, and pdfviewer can't 
> show any text.
> java.lang.NullPointerException
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:366)
>       at 
> org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.java:62)
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:556)
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:270)
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:246)
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:217)
>       at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:119)
>       at org.apache.pdfbox.pdfviewer.PDFPagePanel.paint(PDFPagePanel.java:98)
> java.util.EmptyStackException
>       at java.util.Stack.peek(Stack.java:85)
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:601)
>       at 
> org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54)
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:556)
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:270)
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:246)
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:217)
>       at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:119)
>       at org.apache.pdfbox.pdfviewer.PDFPagePanel.paint(PDFPagePanel.java:98)
>       at javax.swing.JComponent.paintChildren(JComponent.java:862)
> Open document with "-nonSeq" parameter
> Exception in thread "main" java.io.IOException: Error reading stream using 
> length value. Expected='endstream' actual='' 
>       at 
> org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseCOSStream(NonSequentialPDFParser.java:1327)
>       at 
> org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:1032)
>       at 
> org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:955)
>       at 
> org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseDictObjects(NonSequentialPDFParser.java:929)
>       at 
> org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:337)
>       at 
> org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:574)
>       at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1124)
>       at org.apache.pdfbox.PDFReader.parseDocument(PDFReader.java:378)
>       at org.apache.pdfbox.PDFReader.openPDFFile(PDFReader.java:319)
>       at org.apache.pdfbox.PDFReader.main(PDFReader.java:305)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to