Wrong XRefStream order while parsing incremental updated PDF with XRefStreams
-----------------------------------------------------------------------------
Key: PDFBOX-1042
URL: https://issues.apache.org/jira/browse/PDFBOX-1042
Project: PDFBox
Issue Type: Bug
Components: Parsing
Affects Versions: 1.5.0
Reporter: Thomas Chojecki
Priority: Critical
A PDF can contain two types of XRef-Entries.
Most files use XRefTables for object references.
Web-Optimized (linearized) pdf document uses XRefStreams. This is a compresed
XRefTable as ObjectStream. The PDFParser parse this objects the same way as
other objects and put them into an object pool (HashMap). If the document was
incremental updated, more XRefStreams would be in the pdf document and all will
be put into the object pool.
The XRefStreamParser begin to parse the XRefStreams and try to gain all
XRefStream-Object from that pool. The objects returned from the pool aren't in
the same order as read. This cause that in some cases the older Object
overwrite the newer one. And this cause that the pdfbox can't find the right
objects and use the older one instead.
If a user try to parse such a document, he will got an indeterminate state.
older and newer objects are mixed.
In my case, a document catalog was overwrote by an old one and i can't see the
changes that was made with the incremental update.
A patch and a sample pdf will come soon.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira