[
https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Lehmkühler closed PDFBOX-5879.
--------------------------------------
> Regression from PDFBOX-5841: Text extraction with rotation magic fails for
> PDF with multiple content streams in a page
> ----------------------------------------------------------------------------------------------------------------------
>
> Key: PDFBOX-5879
> URL: https://issues.apache.org/jira/browse/PDFBOX-5879
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 2.0.32, 3.0.3 PDFBox
> Reporter: Gábor Stefanik
> Assignee: Tilman Hausherr
> Priority: Major
> Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0
>
> Attachments: MVM_Aram_augusztus.pdf
>
>
> {code:java}
> java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic
> -i="MVM_Aram_augusztus.pdf" {code}
> fails with the following error:
> {code:java}
> java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be
> cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject
> and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app')
> at
> org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336)
> at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225)
> at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62)
> at picocli.CommandLine.executeUserObject(CommandLine.java:2045)
> at picocli.CommandLine.access$1500(CommandLine.java:148)
> at
> picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
> at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
> at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
> at
> picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
> at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
> at picocli.CommandLine.execute(CommandLine.java:2174)
> at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code}
> The same command succeeds in 3.0.2.
> The triggering PDF can be downloaded from
> [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,]
> and is also attached.
> The root cause appears to be this change:
> [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2]
> from PDFBOX-5841
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]