[jira] [Commented] (PDFBOX-1014) Unused XRef object streams cause parser to fail + FIX

Thomas Chojecki (JIRA) Tue, 21 Jun 2011 01:22:14 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052420#comment-13052420
 ]


Thomas Chojecki commented on PDFBOX-1014:
-----------------------------------------

To sort the objects by the ObjectID order was also my first attempt to solve 
this, but this doesn't work for all documents i think. i have one document 
where the order is not the right one. 
I will test your patch with my copy and tell you if the document i have is 
parsing right.

I'm also writing a patch at the moment and take a break before i don't test 
your patch.

Maybe we can wait a moment with comiting the patch.

> Unused XRef object streams cause parser to fail + FIX
> -----------------------------------------------------
>
>                 Key: PDFBOX-1014
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1014
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.6.0
>            Reporter: Timo Boehme
>
> I have a PDF document with 3 XRef streams (no xref table; PDF version 1.6). 
> Currently PDFBOX reads and parses all 3 streams in the order the appear and 
> combines the data in a dictionary (thus attributes specified in a later XRef 
> stream overwrite attributes in earlier streams). The problem with my document 
> is that the first 2 XRef streams declare document encryption while the last 
> one does not. Furthermore the last one uses another document id thus trying 
> to decrypt the document would fail because of the different IDs (however 
> already the parsing of the stream in the first XRef object already fails.
> The solution I came up with is to first get all XRef streams, start looking 
> from last one if it contains a 'Prev' key and go up the list as long as we 
> have this 'Prev' key. This should work in most cases assuming that multiple 
> active XRef sections appear in order without an unused XRef section in 
> between. A really correct solution would have to test for object byte 
> positions (therefore it would be necessary to store byte positions for each 
> object). 
> The fix in COSDocument.parseXrefStreams():
>     public void parseXrefStreams() throws IOException
>     {
>         COSDictionary trailerDict = new COSDictionary();
>         
>         // use only last XRef and XRef which are referenced by a used XRef 
> via 'Prev'
>         // we assume that 'Prev' will reference next preceding xref object
>         // (otherwise we would have to use object byte positions)
>         List<COSObject> xrefStreams  = getObjectsByType( "XRef" );
>         int             firstXRefIdx = xrefStreams.size() - 1;
>         while ( firstXRefIdx > 0 ) {
>               COSStream stream = (COSStream)xrefStreams.get( firstXRefIdx 
> ).getObject();
>               if ( stream.getInt( COSName.PREV, -1 ) == -1 )
>                       // no 'Prev' key; current xref object will be first one 
> we use
>                       break;
>         }
>         
> //        for( COSObject xrefStream : getObjectsByType( "XRef" ) )
>         for ( int xrefIdx = firstXRefIdx, len = xrefStreams.size(); xrefIdx < 
> len; xrefIdx++ )
>         {
>             COSStream stream = (COSStream)xrefStreams.get( xrefIdx 
> ).getObject();
>             trailerDict.addAll(stream);
>             PDFXrefStreamParser parser =
>                 new PDFXrefStreamParser(stream, this, forceParsing);
>             parser.parse();
>         }
>         setTrailer( trailerDict );
>     }

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-1014) Unused XRef object streams cause parser to fail + FIX

Reply via email to