[
https://issues.apache.org/jira/browse/PDFBOX-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641579#comment-14641579
]
Andreas Lehmkühler edited comment on PDFBOX-2845 at 7/25/15 1:32 PM:
---------------------------------------------------------------------
PDFBox is to strict. The spec says
{quote}
The following objects shall not be stored in an object stream:
- .....
- An object representing the value of the Length entry in an object stream
dictionary
{quote}
In the given case the length is an indirect object (554 0) but not the length
of an object stream. It's the length of some simple stream (515 0) and the
length PDFBox is looking for is in one of the object streams (592 0) which
isn't yet parsed. I've removed the check and ran into another problem. There is
an infinity loop check which throws an exception.
was (Author: lehmi):
PDFBox is to strict. The spec says
{quote}
The following objects shall not be stored in an object stream:
- .....
- An object representing the value of the Length entry in an object stream
dictionary
{quote}
In the given case the length is an indirect object but not the length of an
object stream. It's the length of some simple stream and the length PDFBox is
look for is in one of the object streams which isn't yet parsed. I've removed
the check and ran into another problem. There is an infinity loop check which
throws an exception.
> Error parsing PDF
> -----------------
>
> Key: PDFBOX-2845
> URL: https://issues.apache.org/jira/browse/PDFBOX-2845
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 2.0.0
> Reporter: Christopher Clark
> Fix For: 2.0.0
>
>
> I get the following error when parsing this pdf:
> http://jmlr.csail.mit.edu/proceedings/papers/v28/ranganath13.pdf
> java.io.IOException: Object must be defined and must not be compressed
> object: 554:0
> Stack trace:
> Exception in thread "main" java.io.IOException: Object must be defined and
> must not be compressed object: 554:0
> at
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:682)
> at
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:646)
> at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:847)
> at
> org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:906)
> at
> org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:732)
> at
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:693)
> at
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:646)
> at
> org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:607)
> at
> org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:198)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:225)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:848)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:793)
> at
> org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:192)
> at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:81)
> at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:55)
> Note this problem does not occur in 1.8.9
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]