[
https://issues.apache.org/jira/browse/PDFBOX-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837927#comment-13837927
]
Thomas Chojecki commented on PDFBOX-1792:
-----------------------------------------
The test is attached to the archive. (patch.txt)
Here is the necessary part.
PDDocument seqDoc = PDDocument.load(f);
PDDocument nonSeqDoc = PDDocument.loadNonSeq(f, new RandomAccessBuffer());
PDDocumentInformation seqInfo = seqDoc.getDocumentInformation();
PDDocumentInformation nonSeqInfo = nonSeqDoc.getDocumentInformation();
assertEquals("Metadata item count", seqInfo.getMetadataKeys().size(),
nonSeqInfo.getMetadataKeys().size());
for (String name : seqInfo.getMetadataKeys()){
assertEquals(f.getName() + " :: " + name,
seqInfo.getCustomMetadataValue(name),
nonSeqInfo.getCustomMetadataValue(name));
}
seqDoc.close();
nonSeqDoc.close();
> Metadata not completely extracted with NonSequentialPDFParser on some
> documents
> -------------------------------------------------------------------------------
>
> Key: PDFBOX-1792
> URL: https://issues.apache.org/jira/browse/PDFBOX-1792
> Project: PDFBox
> Issue Type: Bug
> Components: PDModel
> Affects Versions: 1.8.3
> Reporter: Tim Allison
> Priority: Minor
> Attachments: PDFBOX-1792.tar.gz
>
>
> The traditional parser is able to extract metadata from the Annotation test
> document from TIKA-738. The NonSequentialPDFParser is not able to extract
> metadata.
--
This message was sent by Atlassian JIRA
(v6.1#6144)