[
https://issues.apache.org/jira/browse/PDFBOX-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16180927#comment-16180927
]
Tilman Hausherr commented on PDFBOX-3940:
-----------------------------------------
This regression first occured because of r187622 in PDFBOX-3923. One of the
offsets is incorrect (points within table) so exception is thrown and the
trailer is rebuilt. When rebuilding, this piece of code is hit:
{code}
// info dictionary
else if (dictionary.containsKey(COSName.MOD_DATE)
&& (dictionary.containsKey(COSName.TITLE)
|| dictionary.containsKey(COSName.AUTHOR)
|| dictionary.containsKey(COSName.SUBJECT)
|| dictionary.containsKey(COSName.KEYWORDS)
|| dictionary.containsKey(COSName.CREATOR)
|| dictionary.containsKey(COSName.PRODUCER)
|| dictionary.containsKey(COSName.CREATION_DATE)))
{
trailer.setItem(COSName.INFO, document.getObjectFromPool(entry.getKey()));
}
{code}
The "&&" was introduced in PDFBOX-3208 ("ModDate is mandatory for an info
dictionary"). In file 079977.pdf there is no /Info/ModDate. According to the
PDF specification /ModDate is not mandatory.
In PDFBOX-3208 the problem was that without the change there, an outline
dictionary was used as /Info because it had a /Title. I suggest check for
/Parent to decide it's not an /Info. If there are other dictionaries that have
items that are found in /Info then we'd have to add that as well.
> Lost metadata in 2.0.8-SNAPSHOT
> -------------------------------
>
> Key: PDFBOX-3940
> URL: https://issues.apache.org/jira/browse/PDFBOX-3940
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 2.0.8
> Reporter: Tim Allison
> Labels: regression
> Attachments: 079977.pdf, 2_0_7_079977.pdf.json,
> 2_0_8-SNAPSHOT_079977.pdf.json
>
>
> We noticed some missing metadata values in the recent large scale regression
> testing. I finally had a chance to look. It looks like a genuine regression.
> The diff btwn 2.0.7 and 2.0.8-SNAPSHOT in metadata values is often -2.
> However, in some files, the problem is more pronounced.
> In the attached file, when we call {{PDDocument.getDocumentInformation()}},
> the returned {{PDDocumentInformation info}} is empty in 2.0.8-SNAPSHOT but
> not in 2.0.7.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]