Wrong XRefStream order while parsing incremental updated PDF with XRefStreams
-----------------------------------------------------------------------------

                 Key: PDFBOX-1042
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1042
             Project: PDFBox
          Issue Type: Bug
          Components: Parsing
    Affects Versions: 1.5.0
            Reporter: Thomas Chojecki
            Priority: Critical


A PDF can contain two types of XRef-Entries.
Most files use XRefTables for object references.

Web-Optimized (linearized) pdf document uses XRefStreams. This is a compresed 
XRefTable as ObjectStream. The PDFParser parse this objects the same way as 
other objects and put them into an object pool (HashMap). If the document was 
incremental updated, more XRefStreams would be in the pdf document and all will 
be put into the object pool.

The XRefStreamParser begin to parse the XRefStreams and try to gain all 
XRefStream-Object from that pool. The objects returned from the pool aren't in 
the same order as read. This cause that in some cases the older Object 
overwrite the newer one. And this cause that the pdfbox can't find the right 
objects and use the older one instead.

If a user try to parse such a document, he will got an indeterminate state. 
older and newer objects are mixed.

In my case, a document catalog was overwrote by an old one and i can't see the 
changes that was made with the incremental update.

A patch and a sample pdf will come soon.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to