[jira] [Commented] (PDFBOX-4927) IllegalStateException: Expected 'Page' but found COSName{Annot} in PDPageTree.sanitizeType

Jira Thu, 30 Jul 2020 22:28:43 -0700


    [ 
https://issues.apache.org/jira/browse/PDFBOX-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168408#comment-17168408
 ]


Andreas Lehmkühler commented on PDFBOX-4927:
--------------------------------------------

According to the PDF spec the offsets within an object stream shall be in 
ascending order but obviously we can't rely on that. Due to the sequential 
parsing we need those offsets in ascending order otherwise the objects get 
mixed up. I've added a TreeMap to sort the offsets to ensure the needed ordering

> IllegalStateException: Expected 'Page' but found COSName{Annot} in 
> PDPageTree.sanitizeType
> ------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-4927
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4927
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.21
>            Reporter: Tilman Hausherr
>            Assignee: Andreas Lehmkühler
>            Priority: Major
>              Labels: regression
>         Attachments: 3DDNDTVSP354Z72MXOJKUXVDNN7LFCPY.pdf
>
>
> {noformat}
> Exception in thread "main" java.lang.IllegalStateException: Expected 'Page' 
> but found COSName{Annot}
>         at 
> org.apache.pdfbox.pdmodel.PDPageTree.sanitizeType(PDPageTree.java:250)
>         at org.apache.pdfbox.pdmodel.PDPageTree.access$300(PDPageTree.java:41)
>         at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.next(PDPageTree.java:210)
>         at 
> org.apache.pdfbox.pdmodel.PDPageTree$PageIterator.next(PDPageTree.java:170)
>         at 
> org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:320)
>         at 
> org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:272)
>         at 
> org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:377)
>         at 
> org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:274)
>         at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:97)
>         at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:60) {noformat}
> File works in 2.0.20 and in the trunk



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-4927) IllegalStateException: Expected 'Page' but found COSName{Annot} in PDPageTree.sanitizeType

Reply via email to