[ https://issues.apache.org/jira/browse/PDFBOX-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler closed PDFBOX-1014. -------------------------------------- Assignee: Andreas Lehmkühler > Unused XRef object streams cause parser to fail + FIX > ----------------------------------------------------- > > Key: PDFBOX-1014 > URL: https://issues.apache.org/jira/browse/PDFBOX-1014 > Project: PDFBox > Issue Type: Bug > Components: Parsing > Affects Versions: 1.6.0 > Reporter: Timo Boehme > Assignee: Andreas Lehmkühler > > I have a PDF document with 3 XRef streams (no xref table; PDF version 1.6). > Currently PDFBOX reads and parses all 3 streams in the order the appear and > combines the data in a dictionary (thus attributes specified in a later XRef > stream overwrite attributes in earlier streams). The problem with my document > is that the first 2 XRef streams declare document encryption while the last > one does not. Furthermore the last one uses another document id thus trying > to decrypt the document would fail because of the different IDs (however > already the parsing of the stream in the first XRef object already fails. > The solution I came up with is to first get all XRef streams, start looking > from last one if it contains a 'Prev' key and go up the list as long as we > have this 'Prev' key. This should work in most cases assuming that multiple > active XRef sections appear in order without an unused XRef section in > between. A really correct solution would have to test for object byte > positions (therefore it would be necessary to store byte positions for each > object). > The fix in COSDocument.parseXrefStreams(): > public void parseXrefStreams() throws IOException > { > COSDictionary trailerDict = new COSDictionary(); > > // use only last XRef and XRef which are referenced by a used XRef > via 'Prev' > // we assume that 'Prev' will reference next preceding xref object > // (otherwise we would have to use object byte positions) > List<COSObject> xrefStreams = getObjectsByType( "XRef" ); > int firstXRefIdx = xrefStreams.size() - 1; > while ( firstXRefIdx > 0 ) { > COSStream stream = (COSStream)xrefStreams.get( firstXRefIdx > ).getObject(); > if ( stream.getInt( COSName.PREV, -1 ) == -1 ) > // no 'Prev' key; current xref object will be first one > we use > break; > } > > // for( COSObject xrefStream : getObjectsByType( "XRef" ) ) > for ( int xrefIdx = firstXRefIdx, len = xrefStreams.size(); xrefIdx < > len; xrefIdx++ ) > { > COSStream stream = (COSStream)xrefStreams.get( xrefIdx > ).getObject(); > trailerDict.addAll(stream); > PDFXrefStreamParser parser = > new PDFXrefStreamParser(stream, this, forceParsing); > parser.parse(); > } > setTrailer( trailerDict ); > } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira