Gábor Stefanik created PDFBOX-5879:
--------------------------------------
Summary: Regression from PDFBOX-5841: Text extraction with
rotation magic fails for PDF with multiple content streams in a page
Key: PDFBOX-5879
URL: https://issues.apache.org/jira/browse/PDFBOX-5879
Project: PDFBox
Issue Type: Bug
Components: Text extraction
Affects Versions: 3.0.3 PDFBox
Reporter: Gábor Stefanik
Attachments: MVM_Aram_augusztus.pdf
{code:java}
java -jar pdfbox-app-3.0.3.jar export:text -console -rotationMagic
-i="MVM_Aram_augusztus.pdf" {code}
fails with the following error:
{code:java}
java.lang.ClassCastException: class org.apache.pdfbox.cos.COSObject cannot be
cast to class org.apache.pdfbox.cos.COSArray (org.apache.pdfbox.cos.COSObject
and org.apache.pdfbox.cos.COSArray are in unnamed module of loader 'app')
at
org.apache.pdfbox.tools.ExtractText.extractPages(ExtractText.java:336)
at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:225)
at org.apache.pdfbox.tools.ExtractText.call(ExtractText.java:62)
at picocli.CommandLine.executeUserObject(CommandLine.java:2045)
at picocli.CommandLine.access$1500(CommandLine.java:148)
at
picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2465)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2457)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2419)
at
picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2277)
at picocli.CommandLine$RunLast.execute(CommandLine.java:2421)
at picocli.CommandLine.execute(CommandLine.java:2174)
at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:76) {code}
The same command succeeds in 3.0.2.
The triggering PDF can be downloaded from
[https://nagykorosiallatmentok.hu/wp-content/uploads/2023/09/MVM_Aram_augusztus.pdf,]
and is also attached.
The root cause appears to be this change:
[https://github.com/apache/pdfbox/commit/b03d12d56dd74e5c52d80cf0b80c5bfb1f3209b2]
from PDFBOX-5841
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]