[ 
https://issues.apache.org/jira/browse/PDFBOX-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641579#comment-14641579
 ] 

Andreas Lehmkühler edited comment on PDFBOX-2845 at 7/25/15 1:32 PM:
---------------------------------------------------------------------

PDFBox is to strict. The spec says
{quote}
The following objects shall not be stored in an object stream:
- .....
- An object representing the value of the Length entry in an object stream 
dictionary
{quote}
In the given case the length is an indirect object (554 0) but not the length 
of an object stream. It's the length of some simple stream (515 0) and the 
length PDFBox is looking for is in one of the object streams (592 0) which 
isn't yet parsed. I've removed the check and ran into another problem. There is 
an infinity loop check which throws an exception.


was (Author: lehmi):
PDFBox is to strict. The spec says
{quote}
The following objects shall not be stored in an object stream:
- .....
- An object representing the value of the Length entry in an object stream 
dictionary
{quote}
In the given case the length is an indirect object but not the length of an 
object stream. It's the length of some simple stream and the length PDFBox is 
look for is in one of the object streams which isn't yet parsed. I've removed 
the check and ran into another problem. There is an infinity loop check which 
throws an exception.

> Error parsing PDF
> -----------------
>
>                 Key: PDFBOX-2845
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2845
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.0
>            Reporter: Christopher Clark
>             Fix For: 2.0.0
>
>
> I get the following error when parsing this pdf:  
> http://jmlr.csail.mit.edu/proceedings/papers/v28/ranganath13.pdf
> java.io.IOException: Object must be defined and must not be compressed 
> object: 554:0
> Stack trace:
> Exception in thread "main" java.io.IOException: Object must be defined and 
> must not be compressed object: 554:0
>         at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:682)
>         at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:646)
>         at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:847)
>         at 
> org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:906)
>         at 
> org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:732)
>         at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:693)
>         at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:646)
>         at 
> org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:607)
>         at 
> org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:198)
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:225)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:848)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:793)
>         at 
> org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:192)
>         at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:81)
>         at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:55)
> Note this problem does not occur in 1.8.9



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to