[ 
https://issues.apache.org/jira/browse/PDFBOX-3506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510493#comment-15510493
 ] 

Tilman Hausherr commented on PDFBOX-3506:
-----------------------------------------

Now I get it: you're missing /Classification, /bjDocumentLabelXML, 
/bjLabelRefreshRequired.

The file has been updated, i.e. there are three versions of the /Info 
dictionary. The old parser (default in 1.8) took the last one, the nonSeq (and 
in 2.* only) parser takes the first one. The first one does not have the custom 
meta data mentioned, the last one has it.

To demonstrate this, I changed the first character of the authors first name 
"Paritosh" to 1, 2 and 3. Adobe shows a "3", and we show a "1".

The cause:
{code}
xrefTrailerResolver.setXRef(objKey, currOffset);
{code}
is called twice with the same objKey (7 0 R) but different offsets.

The cause is an XRefStm in each trailer. This results in a call to 
parseXrefStream, which does not reset the {{curXrefTrailerObj.xrefTable}} map 
by calling {{xrefTrailerResolver.nextXrefObj}}, per the comment
{code}
        // the cross reference stream of a hybrid xref table will be added to 
the existing one
        // and we must not override the offset and the trailer
{code}

One solution would be to make a change in XrefTrailerResolver and insert 
elements only if they don't exist:
{code}
    public void setXRef( COSObjectKey objKey, long offset )
    {
        if ( curXrefTrailerObj == null )
        {
            // should not happen...
            LOG.warn( "Cannot add XRef entry for '" + objKey.getNumber() + "' 
because XRef start was not signalled." );
            return;
        }
        if (!curXrefTrailerObj.xrefTable.containsKey(objKey)) // NEW
        {
            curXrefTrailerObj.xrefTable.put(objKey, offset);
        }
    }
{code}
i.e. entries from the table have a higher priority than entries from the 
XRefStm.

> Not able to read the custom metadata in trailer section
> -------------------------------------------------------
>
>                 Key: PDFBOX-3506
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3506
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 2.0.3
>         Environment: Windows 7, PDF version 1.5
>            Reporter: Kent Lee
>         Attachments: test.pdf
>
>
> When using below code does not able to retrieve custom metadata stored in 
> trailer section of pdf
> PDDocumentInformation documentInformation = document.getDocumentInformation();
>               Set<String> customMetadataKeys = 
> documentInformation.getMetadataKeys();
> Pdfbox 1.8.12 does not have this issues



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to