[ https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882327#comment-17882327 ]
Tilman Hausherr edited comment on PDFBOX-5879 at 9/17/24 9:08 AM: ------------------------------------------------------------------ I added a simple test for the rotationMagic feature because it turns out we didn't have any. However this isn't a test of the fixed bug, that would have been more difficult to create a file, and there is no risk that this fix gets reverted anyway. was (Author: tilman): I added a simple test for the feature because it turns out we didn't have any. However this isn't a test of the fixed bug, that would have been more difficult to create a file, and there is no risk that this fix gets reverted anyway. > Regression from PDFBOX-5841: Text extraction with rotation magic fails for > PDF with multiple content streams in a page > ---------------------------------------------------------------------------------------------------------------------- > > Key: PDFBOX-5879 > URL: https://issues.apache.org/jira/browse/PDFBOX-5879 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 2.0.32, 3.0.3 PDFBox > Reporter: Gábor Stefanik > Assignee: Tilman Hausherr > Priority: Major > Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0 > > Attachments: MVM_Aram_augusztus.pdf > > > {code:java} > java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic > -i="MVM_Aram_augusztus.pdf" {code} > fails with the following error: > {code:java} > java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be > cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject > and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app') > at > org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225) > at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62) > at picocli.CommandLine.executeUserObject(CommandLine.java:2045) > at picocli.CommandLine.access$1500(CommandLine.java:148) > at > picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2457) > at picocli.CommandLine$RunLast.handle(CommandLine.java:2419) > at > picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277) > at picocli.CommandLine$RunLast.execute(CommandLine.java:2421) > at picocli.CommandLine.execute(CommandLine.java:2174) > at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code} > The same command succeeds in 3.0.2. > The triggering PDF can be downloaded from > [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,] > and is also attached. > The root cause appears to be this change: > [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2] > from PDFBOX-5841 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org