[
https://issues.apache.org/jira/browse/PDFBOX-5245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387097#comment-17387097
]
Tilman Hausherr commented on PDFBOX-5245:
-----------------------------------------
Yeah, I see now it fails on the page 288 (labelled as 261). The content stream
of that page is messed up too, the errors start at "-TV". If you trust that
file, try to view that page with Adobe Reader, I expect that the page will be
empty or incomplete.
{noformat}
[(Social )-58(media )-58(marketing )-TV)-492(shows)-492(tTJ
-1192(have00TT0 1 Tf
[( )-9.4 9)-8(pla 0 rais )-58(issue )-58(92(cont44005491opublished00492(k
)-492(sThisr )l)-492(1sappreh92(-8(mean sreh92)-8(anh92)TV)-492(pro0TT0
aailme10003>-90<0-9.4 9)t )-8(do<<005300500345600030044005706<00503>-166-166<00
6<0076<00B2040066<0670J0>-1r00461<00503>40J0>-1r600052000B20C00
6<00F0J0>-1r0000003>0050C00
6<00>-1r304004C005107<00500052005500003>-3>66<004-166<062<0670J0>-1r600052000B20C00
6<00F0J0>-1r0000003>0050C00
6<00>-1r0>-00F005<00E0578 Tf
T*
[(br1<0053011>Tj
/T
Q
-58
{noformat}
> IOException: Unknown dir object c=')' cInt=41 peek=')' peekInt=41 at offset
> 8571
> ---------------------------------------------------------------------------------
>
> Key: PDFBOX-5245
> URL: https://issues.apache.org/jira/browse/PDFBOX-5245
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 2.0.24
> Reporter: funaiy
> Priority: Major
>
> we fetch the text and image content from pdf by pdfbox, but some pdf files
> throw IoException; the pdfbox version is 2.0.24;pls help check
>
> {code:java}
> Caused by: java.io.IOException: Unknown dir object c=')' cInt=41 peek=')'
> peekInt=41 at offset 8571 (start offset: 8571)
> at
> org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:913)
> ~[pdfbox-2.0.24.jar!/:2.0.24]
> at
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:154)
> ~[pdfbox-2.0.24.jar!/:2.0.24]
> at
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:288)
> ~[pdfbox-2.0.24.jar!/:2.0.24]
> at
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:218)
> ~[pdfbox-2.0.24.jar!/:2.0.24]
> at
> org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:857)
> ~[pdfbox-2.0.24.jar!/:2.0.24]
> at
> org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:907)
> ~[pdfbox-2.0.24.jar!/:2.0.24]
> at
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:876)
> ~[pdfbox-2.0.24.jar!/:2.0.24]
> at
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:796)
> ~[pdfbox-2.0.24.jar!/:2.0.24]
> at
> org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2858)
> ~[pdfbox-2.0.24.jar!/:2.0.24]
> at
> org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:175)
> ~[pdfbox-2.0.24.jar!/:2.0.24]
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
> ~[pdfbox-2.0.24.jar!/:2.0.24]
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1228)
> ~[pdfbox-2.0.24.jar!/:2.0.24]
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1128)
> ~[pdfbox-2.0.24.jar!/:2.0.24]
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]