Tilman Hausherr created PDFBOX-2351:
---------------------------------------

             Summary: /XRefStm content missing in saved file 
                 Key: PDFBOX-2351
                 URL: https://issues.apache.org/jira/browse/PDFBOX-2351
             Project: PDFBox
          Issue Type: Bug
          Components: Parsing
    Affects Versions: 2.0.0
            Reporter: Tilman Hausherr


Do this:

- open the file immo-kurier_arsenal_93x62.pdf, PDFBOX-1577.pdf, 
PDFBOX-1756-436857.pdf, PDFBOX-2251-070075.pdf, test-landscape2.pdf or any file 
that has an /XRefStm with loadNonSeq
- call getDocumentCatalog()
- save to another file
- open that file with loadNonSeq()

{code}
java.io.IOException: Error: Expected a long type at offset 688, instead got 
'"'
        at org.apache.pdfbox.pdfparser.BaseParser.readLong(BaseParser.java:1718)
        at 
org.apache.pdfbox.pdfparser.BaseParser.readObjectNumber(BaseParser.java:1645)
        at 
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseXrefObjStream(NonSequentialPDFParser.java:548)
        at 
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:410)
        at 
org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:794)
        at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1156)
{code}
The saved file still has the old /XRefStm value, but no content. I debugged a 
bit, it is confusing - the /XRefStm is never read, instead the /Prev is used, 
which leads to an old-style xref table. When saving, the existing /XRefStm 
value is kept in doWriteXRef() even if PDFBox "believes" it has no XRefStream. 
But doWriteXRefInc() is smarter and deletes the item if there is no XRefStream.

I haven't tested it with 1.8. We should test it if there's a fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to