[jira] [Commented] (PDFBOX-1792) Metadata not completely extracted with NonSequentialPDFParser on some documents

Thomas Chojecki (JIRA) Tue, 03 Dec 2013 09:37:26 -0800

    [ 
https://issues.apache.org/jira/browse/PDFBOX-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837927#comment-13837927
 ]


Thomas Chojecki commented on PDFBOX-1792:
-----------------------------------------

The test is attached to the archive. (patch.txt)

Here is the necessary part.

PDDocument seqDoc = PDDocument.load(f);
PDDocument nonSeqDoc = PDDocument.loadNonSeq(f, new RandomAccessBuffer());
PDDocumentInformation seqInfo = seqDoc.getDocumentInformation();
PDDocumentInformation nonSeqInfo = nonSeqDoc.getDocumentInformation();
assertEquals("Metadata item count", seqInfo.getMetadataKeys().size(), 
nonSeqInfo.getMetadataKeys().size());
for (String name : seqInfo.getMetadataKeys()){
  assertEquals(f.getName() + " :: " + name, 
  seqInfo.getCustomMetadataValue(name), 
nonSeqInfo.getCustomMetadataValue(name));
}
seqDoc.close();
nonSeqDoc.close();

> Metadata not completely extracted with NonSequentialPDFParser on some 
> documents
> -------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1792
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1792
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.8.3
>            Reporter: Tim Allison
>            Priority: Minor
>         Attachments: PDFBOX-1792.tar.gz
>
>
> The traditional parser is able to extract metadata from the Annotation test 
> document from TIKA-738.  The NonSequentialPDFParser is not able to extract 
> metadata.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (PDFBOX-1792) Metadata not completely extracted with NonSequentialPDFParser on some documents

Reply via email to