Tilman Hausherr created PDFBOX-2351:
---------------------------------------
Summary: /XRefStm content missing in saved file
Key: PDFBOX-2351
URL: https://issues.apache.org/jira/browse/PDFBOX-2351
Project: PDFBox
Issue Type: Bug
Components: Parsing
Affects Versions: 2.0.0
Reporter: Tilman Hausherr
Do this:
- open the file immo-kurier_arsenal_93x62.pdf, PDFBOX-1577.pdf,
PDFBOX-1756-436857.pdf, PDFBOX-2251-070075.pdf, test-landscape2.pdf or any file
that has an /XRefStm with loadNonSeq
- call getDocumentCatalog()
- save to another file
- open that file with loadNonSeq()
{code}
java.io.IOException: Error: Expected a long type at offset 688, instead got
'"'
at org.apache.pdfbox.pdfparser.BaseParser.readLong(BaseParser.java:1718)
at
org.apache.pdfbox.pdfparser.BaseParser.readObjectNumber(BaseParser.java:1645)
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseXrefObjStream(NonSequentialPDFParser.java:548)
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:410)
at
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:794)
at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1156)
{code}
The saved file still has the old /XRefStm value, but no content. I debugged a
bit, it is confusing - the /XRefStm is never read, instead the /Prev is used,
which leads to an old-style xref table. When saving, the existing /XRefStm
value is kept in doWriteXRef() even if PDFBox "believes" it has no XRefStream.
But doWriteXRefInc() is smarter and deletes the item if there is no XRefStream.
I haven't tested it with 1.8. We should test it if there's a fix.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)