[
https://issues.apache.org/jira/browse/PDFBOX-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641579#comment-14641579
]
Andreas Lehmkühler commented on PDFBOX-2845:
--------------------------------------------
PDFBox is to strict. The spec says
{quote}
The following objects shall not be stored in an object stream:
- .....
- An object representing the value of the Length entry in an object stream
dictionary
{quote}
In the given case the length is an indirect object but not the length of an
object stream. It's the length of some simple stream and the length PDFBox is
look for is in one of the object streams which isn't yet parsed. I've removed
the check and ran into another problem. There is an infinity loop check which
throws an exception.
> Error parsing PDF
> -----------------
>
> Key: PDFBOX-2845
> URL: https://issues.apache.org/jira/browse/PDFBOX-2845
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 2.0.0
> Reporter: Christopher Clark
> Fix For: 2.0.0
>
>
> I get the following error when parsing this pdf:
> http://jmlr.csail.mit.edu/proceedings/papers/v28/ranganath13.pdf
> java.io.IOException: Object must be defined and must not be compressed
> object: 554:0
> Stack trace:
> Exception in thread "main" java.io.IOException: Object must be defined and
> must not be compressed object: 554:0
> at
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:682)
> at
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:646)
> at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:847)
> at
> org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:906)
> at
> org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:732)
> at
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:693)
> at
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:646)
> at
> org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:607)
> at
> org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:198)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:225)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:848)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:793)
> at
> org.apache.pdfbox.tools.ExtractText.startExtraction(ExtractText.java:192)
> at org.apache.pdfbox.tools.ExtractText.main(ExtractText.java:81)
> at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:55)
> Note this problem does not occur in 1.8.9
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]