[
https://issues.apache.org/jira/browse/PDFBOX-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052420#comment-13052420
]
Thomas Chojecki commented on PDFBOX-1014:
-----------------------------------------
To sort the objects by the ObjectID order was also my first attempt to solve
this, but this doesn't work for all documents i think. i have one document
where the order is not the right one.
I will test your patch with my copy and tell you if the document i have is
parsing right.
I'm also writing a patch at the moment and take a break before i don't test
your patch.
Maybe we can wait a moment with comiting the patch.
> Unused XRef object streams cause parser to fail + FIX
> -----------------------------------------------------
>
> Key: PDFBOX-1014
> URL: https://issues.apache.org/jira/browse/PDFBOX-1014
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 1.6.0
> Reporter: Timo Boehme
>
> I have a PDF document with 3 XRef streams (no xref table; PDF version 1.6).
> Currently PDFBOX reads and parses all 3 streams in the order the appear and
> combines the data in a dictionary (thus attributes specified in a later XRef
> stream overwrite attributes in earlier streams). The problem with my document
> is that the first 2 XRef streams declare document encryption while the last
> one does not. Furthermore the last one uses another document id thus trying
> to decrypt the document would fail because of the different IDs (however
> already the parsing of the stream in the first XRef object already fails.
> The solution I came up with is to first get all XRef streams, start looking
> from last one if it contains a 'Prev' key and go up the list as long as we
> have this 'Prev' key. This should work in most cases assuming that multiple
> active XRef sections appear in order without an unused XRef section in
> between. A really correct solution would have to test for object byte
> positions (therefore it would be necessary to store byte positions for each
> object).
> The fix in COSDocument.parseXrefStreams():
> public void parseXrefStreams() throws IOException
> {
> COSDictionary trailerDict = new COSDictionary();
>
> // use only last XRef and XRef which are referenced by a used XRef
> via 'Prev'
> // we assume that 'Prev' will reference next preceding xref object
> // (otherwise we would have to use object byte positions)
> List<COSObject> xrefStreams = getObjectsByType( "XRef" );
> int firstXRefIdx = xrefStreams.size() - 1;
> while ( firstXRefIdx > 0 ) {
> COSStream stream = (COSStream)xrefStreams.get( firstXRefIdx
> ).getObject();
> if ( stream.getInt( COSName.PREV, -1 ) == -1 )
> // no 'Prev' key; current xref object will be first one
> we use
> break;
> }
>
> // for( COSObject xrefStream : getObjectsByType( "XRef" ) )
> for ( int xrefIdx = firstXRefIdx, len = xrefStreams.size(); xrefIdx <
> len; xrefIdx++ )
> {
> COSStream stream = (COSStream)xrefStreams.get( xrefIdx
> ).getObject();
> trailerDict.addAll(stream);
> PDFXrefStreamParser parser =
> new PDFXrefStreamParser(stream, this, forceParsing);
> parser.parse();
> }
> setTrailer( trailerDict );
> }
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira