[
https://issues.apache.org/jira/browse/PDFBOX-5245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386284#comment-17386284
]
Tilman Hausherr commented on PDFBOX-5245:
-----------------------------------------
Your file is corrupt / has syntax errors.
{noformat}
11 0 obj<</CreationDate(D:20171219132506Z)
/GTS_PDFXVersiopâ:XøU:?&
/Producer(Adobe PDF Library 8.0)
/Author(Charlesworth, Alan)
/Creator(Adobe InDesign CS3 \(5.0.4\))
/Ußúd)
/ModDate(D:20180115001223+05'30')
/Title(Digital Marketing)
/Trapped/False
/GTS_PDFXConformance(PDF/X-3:2002)
/EBX_PUBLISHER(Taylor & Francis Ltd)>>endobj
{noformat}
You could try with 3.0.0 RC1, this uses a different approach for parsing, it
parses on demand so bad objects aren't always hit. (I was able to display a few
pages)
> IOException: Unknown dir object c=')' cInt=41 peek=')' peekInt=41 at offset
> 8571
> ---------------------------------------------------------------------------------
>
> Key: PDFBOX-5245
> URL: https://issues.apache.org/jira/browse/PDFBOX-5245
> Project: PDFBox
> Issue Type: Bug
> Reporter: funaiy
> Priority: Major
>
> we fetch the text and image content from pdf by pdfbox, but some pdf files
> throw IoException; the pdfbox version is 2.0.24;pls help check
>
> {code:java}
> Caused by: java.io.IOException: Unknown dir object c=')' cInt=41 peek=')'
> peekInt=41 at offset 8571 (start offset: 8571)
> at
> org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:913)
> ~[pdfbox-2.0.24.jar!/:2.0.24]
> at
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:154)
> ~[pdfbox-2.0.24.jar!/:2.0.24]
> at
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:288)
> ~[pdfbox-2.0.24.jar!/:2.0.24]
> at
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:218)
> ~[pdfbox-2.0.24.jar!/:2.0.24]
> at
> org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:857)
> ~[pdfbox-2.0.24.jar!/:2.0.24]
> at
> org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:907)
> ~[pdfbox-2.0.24.jar!/:2.0.24]
> at
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:876)
> ~[pdfbox-2.0.24.jar!/:2.0.24]
> at
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:796)
> ~[pdfbox-2.0.24.jar!/:2.0.24]
> at
> org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2858)
> ~[pdfbox-2.0.24.jar!/:2.0.24]
> at
> org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:175)
> ~[pdfbox-2.0.24.jar!/:2.0.24]
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
> ~[pdfbox-2.0.24.jar!/:2.0.24]
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1228)
> ~[pdfbox-2.0.24.jar!/:2.0.24]
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1128)
> ~[pdfbox-2.0.24.jar!/:2.0.24]
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]