Tilman Hausherr created PDFBOX-1811:
---------------------------------------
Summary: java.io.IOException: Object at offset does not end with
'endobj'
Key: PDFBOX-1811
URL: https://issues.apache.org/jira/browse/PDFBOX-1811
Project: PDFBox
Issue Type: Bug
Affects Versions: 2.0.0
Environment: XP, W7
Reporter: Tilman Hausherr
I get this exception with the file amyuni2_05d__pdf1_3_acro4x.pdf (it was once
part of the project, now no more, but it can still be found on the web):
java.io.IOException: Object (48:0) at offset 161333 does not end with 'endobj'.
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:1312)
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseObjectDynamically(NonSequentialPDFParser.java:1159)
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseDictObjects(NonSequentialPDFParser.java:1133)
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:470)
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:731)
at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1139)
at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1122)
at pdfboxpageimageextraction.ExtractImages.doPdf(ExtractImages.java:134)
at pdfboxpageimageextraction.ExtractImages.main(ExtractImages.java:78)
This is true, the "endobject" is indeed missing in that file. However the
content of endObjectKey is 49 0 obj, i.e. the start of a new object.
So my suggestion is to change in NonSequentialPDFParser.java the segment at
{code}
if (!endObjectKey.startsWith("endobj"))
{
throw new IOException("Object (" + readObjNr + ":" + readObjGen + ") at
offset "
+ offsetOrObjstmObNr + " does not end with 'endobj'.");
}
{code}
to
{code}
if (!endObjectKey.startsWith("endobj"))
{
if (endObjectKey.endsWith(" obj"))
LOG.warn("Object (" + readObjNr + ":" + readObjGen + ") at offset "
+ offsetOrObjstmObNr + " does not end with 'endobj' but with '" +
endObjectKey + "'");
else
throw new IOException("Object (" + readObjNr + ":" + readObjGen + ")
at offset "
+ offsetOrObjstmObNr + " does not end with 'endobj' but with '" +
endObjectKey + "'"); }
{code}
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)