[ 
https://issues.apache.org/jira/browse/PDFBOX-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641579#comment-14641579
 ] 

Andreas Lehmkühler commented on PDFBOX-2845:
--------------------------------------------

PDFBox is to strict. The spec says
{quote}
The following objects shall not be stored in an object stream:
- .....
- An object representing the value of the Length entry in an object stream 
dictionary
{quote}
In the given case the length is an indirect object but not the length of an 
object stream. It's the length of some simple stream and the length PDFBox is 
look for is in one of the object streams which isn't yet parsed. I've removed 
the check and ran into another problem. There is an infinity loop check which 
throws an exception.

> Error parsing PDF
> -----------------
>
>                 Key: PDFBOX-2845
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2845
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.0
>            Reporter: Christopher Clark
>             Fix For: 2.0.0
>
>
> I get the following error when parsing this pdf:  
> http://jmlr.csail.mit.edu/proceedings/papers/v28/ranganath13.pdf
> java.io.IOException: Object must be defined and must not be compressed 
> object: 554:0
> Stack trace:
> Exception in thread "main" java.io.IOException: Object must be defined and 
> must not be compressed object: 554:0
>         at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:682)
>         at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:646)
>         at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:847)
>         at 
> org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:906)
>         at 
> org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:732)
>         at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:693)
>         at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:646)
>         at 
> org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:607)
>         at 
> org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:198)
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:225)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:848)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:793)
>         at 
> org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:192)
>         at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:81)
>         at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:55)
> Note this problem does not occur in 1.8.9



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to