Tilman Hausherr commented on PDFBOX-3506:

Now I get it: you're missing /Classification, /bjDocumentLabelXML, 

The file has been updated, i.e. there are three versions of the /Info 
dictionary. The old parser (default in 1.8) took the last one, the nonSeq (and 
in 2.* only) parser takes the first one. The first one does not have the custom 
meta data mentioned, the last one has it.

To demonstrate this, I changed the first character of the authors first name 
"Paritosh" to 1, 2 and 3. Adobe shows a "3", and we show a "1".

The cause:
xrefTrailerResolver.setXRef(objKey, currOffset);
is called twice with the same objKey (7 0 R) but different offsets.

The cause is an XRefStm in each trailer. This results in a call to 
parseXrefStream, which does not reset the {{curXrefTrailerObj.xrefTable}} map 
by calling {{xrefTrailerResolver.nextXrefObj}}, per the comment
        // the cross reference stream of a hybrid xref table will be added to 
the existing one
        // and we must not override the offset and the trailer

One solution would be to make a change in XrefTrailerResolver and insert 
elements only if they don't exist:
    public void setXRef( COSObjectKey objKey, long offset )
        if ( curXrefTrailerObj == null )
            // should not happen...
            LOG.warn( "Cannot add XRef entry for '" + objKey.getNumber() + "' 
because XRef start was not signalled." );
        if (!curXrefTrailerObj.xrefTable.containsKey(objKey)) // NEW
            curXrefTrailerObj.xrefTable.put(objKey, offset);
i.e. entries from the table have a higher priority than entries from the 

> Not able to read the custom metadata in trailer section
> -------------------------------------------------------
>                 Key: PDFBOX-3506
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3506
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 2.0.3
>         Environment: Windows 7, PDF version 1.5
>            Reporter: Kent Lee
>         Attachments: test.pdf
> When using below code does not able to retrieve custom metadata stored in 
> trailer section of pdf
> PDDocumentInformation documentInformation = document.getDocumentInformation();
>               Set<String> customMetadataKeys = 
> documentInformation.getMetadataKeys();
> Pdfbox 1.8.12 does not have this issues

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to