[ 
https://issues.apache.org/jira/browse/PDFBOX-5245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17387097#comment-17387097
 ] 

Tilman Hausherr commented on PDFBOX-5245:
-----------------------------------------

Yeah, I see now it fails on the page 288 (labelled as 261). The content stream 
of that page is messed up too, the errors start at "-TV". If you trust that 
file, try to view that page with Adobe Reader, I expect that the page will be 
empty or incomplete.
{noformat}
[(Social )-58(media )-58(marketing )-TV)-492(shows)-492(tTJ
-1192(have00TT0 1 Tf
[( )-9.4 9)-8(pla 0 rais )-58(issue )-58(92(cont44005491opublished00492(k 
)-492(sThisr )l)-492(1sappreh92(-8(mean sreh92)-8(anh92)TV)-492(pro0TT0 
aailme10003>-90<0-9.4 9)t )-8(do<<005300500345600030044005706<00503>-166-166<00
6<0076<00B2040066<0670J0>-1r00461<00503>40J0>-1r600052000B20C00
6<00F0J0>-1r0000003>0050C00
6<00>-1r304004C005107<00500052005500003>-3>66<004-166<062<0670J0>-1r600052000B20C00
6<00F0J0>-1r0000003>0050C00
6<00>-1r0>-00F005<00E0578 Tf
T*
[(br1<0053011>Tj
/T
Q

-58
{noformat}

> IOException: Unknown dir object c=')' cInt=41 peek=')' peekInt=41 at offset 
> 8571 
> ---------------------------------------------------------------------------------
>
>                 Key: PDFBOX-5245
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5245
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.24
>            Reporter: funaiy
>            Priority: Major
>
> we fetch the text and image content from pdf  by pdfbox, but some pdf files 
> throw IoException; the pdfbox version is 2.0.24;pls help check
>   
> {code:java}
> Caused by: java.io.IOException: Unknown dir object c=')' cInt=41 peek=')' 
> peekInt=41 at offset 8571 (start offset: 8571)
>       at 
> org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:913) 
> ~[pdfbox-2.0.24.jar!/:2.0.24]
>       at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:154)
>  ~[pdfbox-2.0.24.jar!/:2.0.24]
>       at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:288)
>  ~[pdfbox-2.0.24.jar!/:2.0.24]
>       at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:218)
>  ~[pdfbox-2.0.24.jar!/:2.0.24]
>       at 
> org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:857) 
> ~[pdfbox-2.0.24.jar!/:2.0.24]
>       at 
> org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:907) 
> ~[pdfbox-2.0.24.jar!/:2.0.24]
>       at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:876)
>  ~[pdfbox-2.0.24.jar!/:2.0.24]
>       at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:796)
>  ~[pdfbox-2.0.24.jar!/:2.0.24]
>       at 
> org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2858)
>  ~[pdfbox-2.0.24.jar!/:2.0.24]
>       at 
> org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:175) 
> ~[pdfbox-2.0.24.jar!/:2.0.24]
>       at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226) 
> ~[pdfbox-2.0.24.jar!/:2.0.24]
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1228) 
> ~[pdfbox-2.0.24.jar!/:2.0.24]
>       at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1128) 
> ~[pdfbox-2.0.24.jar!/:2.0.24]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to