[
https://issues.apache.org/jira/browse/PDFBOX-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882327#comment-17882327
]
Tilman Hausherr edited comment on PDFBOX-5879 at 9/17/24 9:08 AM:
------------------------------------------------------------------
I added a simple test for the rotationMagic feature because it turns out we
didn't have any. However this isn't a test of the fixed bug, that would have
been more difficult to create a file, and there is no risk that this fix gets
reverted anyway.
was (Author: tilman):
I added a simple test for the feature because it turns out we didn't have any.
However this isn't a test of the fixed bug, that would have been more difficult
to create a file, and there is no risk that this fix gets reverted anyway.
> Regression from PDFBOX-5841: Text extraction with rotation magic fails for
> PDF with multiple content streams in a page
> ----------------------------------------------------------------------------------------------------------------------
>
> Key: PDFBOX-5879
> URL: https://issues.apache.org/jira/browse/PDFBOX-5879
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 2.0.32, 3.0.3 PDFBox
> Reporter: Gábor Stefanik
> Assignee: Tilman Hausherr
> Priority: Major
> Fix For: 2.0.33, 3.0.4 PDFBox, 4.0.0
>
> Attachments: MVM_Aram_augusztus.pdf
>
>
> {code:java}
> java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic
> -i="MVM_Aram_augusztus.pdf" {code}
> fails with the following error:
> {code:java}
> java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be
> cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject
> and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app')
> at
> org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336)
> at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225)
> at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62)
> at picocli.CommandLine.executeUserObject(CommandLine.java:2045)
> at picocli.CommandLine.access$1500(CommandLine.java:148)
> at
> picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
> at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
> at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
> at
> picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
> at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
> at picocli.CommandLine.execute(CommandLine.java:2174)
> at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code}
> The same command succeeds in 3.0.2.
> The triggering PDF can be downloaded from
> [https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,]
> and is also attached.
> The root cause appears to be this change:
> [https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2]
> from PDFBOX-5841
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]