[
https://issues.apache.org/jira/browse/PDFBOX-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052555#comment-13052555
]
Timo Boehme commented on PDFBOX-1016:
-------------------------------------
If you applied the patch from this Fix (with XrefTrailerResolver.java etc.) you
should get the correct order where 2 overwrites 1. In XrefTrailerResolver first
the list of XREFs ist build (in setStartxref()), which means in this case [2;
1] and this list is reversed [1; 2] and with the reversed list the table is
build (first entries from 1 are applied, than entries from 2, possibly
overwriting entries from 1).
Only in case startxref was not set or points to a wrong byte offset the
algorithm falls back to current trunk behavior using xrefs in order they appear
in document (here [2; 1]). In this case you should have got a warning in log.
Could you please check that startxref is set and no warning messages are
generated?
> Specification conform xref/trailer parsing + Fix
> ------------------------------------------------
>
> Key: PDFBOX-1016
> URL: https://issues.apache.org/jira/browse/PDFBOX-1016
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 1.6.0
> Reporter: Timo Boehme
> Attachments: COSDocument.diff, PDFParser.diff,
> PDFXrefStreamParser.diff, XrefTrailerResolver.java, XrefTrailerResolver.java
>
> Original Estimate: 10m
> Remaining Estimate: 10m
>
> PDFBOX currently reads xref table/trailer and XRef objects without using
> startxref or 'Prev' information which results in applying not active data
> resulting in using wrong objects or resulting in parsing exceptions because
> old trailer settings do not apply anymore. This happens especially with
> updated PDF documents where changes are simply appended and old objects/xref
> entries remain but are not referenced. My last patch (PDFBOX-1014) tried to
> solve this for a specific case but it was based on assumptions which do not
> hold in every case.
> The specification compliant way is to read the last startxref which points to
> the last xref object which itself may reference further xref objects using
> 'Prev' attribute.
> I have written a fix which works the standard way and can fall back to the
> old behavior in case startxref is wrong or missing. The fix tries to be as
> unobtrusive as possible. A new class (o.a.p.pdfparser.XrefTrailerResolver) is
> filled with all xref table/trailer and XRef object data. After document is
> parsed (and last startxref is read) this class creates xref table and trailer
> using startxref and 'Prev' information. Beside this new class there are small
> changes to PDFParser and COSDocument.
> This bugfix/improvement should bring PDFBOX a good step closer to be PDF
> specification conform - especially as long as the new specification conform
> parser project is not finished.
> This bugfix supersedes the fix from PDFBOX-1014.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira