[
https://issues.apache.org/jira/browse/PDFBOX-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pat Hickey updated PDFBOX-1872:
-------------------------------
Description:
When the Metadata is encoded with the Crypt filter, exportMetadata() fails to
parse the XML. My guess is that PDDocumentCatalog.getMetadata() gives
PDMetadata the raw stream, instead of the filtered one. Then
PDMetadata.exportXMPMetadata() calls XMPMetadata.load(), which cannot parse the
encrypted stream.
While I cannot post the document (proprietary), the outline shown by
PDFDebugger goes like this:
Root:Dictionary(Catalog)
+ AcroForm:Dictionary
- Metadata:Stream(Metadata:XML)
- Filter:Array
o [0] Crypt
o Length:6302
o Subtype:XML
o Type:Metadata
was:
My guess is that PDDocumentCatalog.getMetadata() gives PDMetadata the raw
stream, instead of the filtered one. Then PDMetadata.exportXMPMetadata() calls
XMPMetadata.load(), which cannot parse the encrypted stream.
As a workaround, this seems to do the trick (where document is the PDDocument
loaded from the PDF):
String content = null;
COSStream md =
(COSStream)document.getDocument().getCatalog().getDictionaryObject(
COSName.METADATA );
if ( md != null ) {
PDStream pd = new PDStream( md );
content = pd.getInputStreamAsString();
}
> PDMetadata.exportXMPMetadata fails when Metadata has encrypted stream
> ---------------------------------------------------------------------
>
> Key: PDFBOX-1872
> URL: https://issues.apache.org/jira/browse/PDFBOX-1872
> Project: PDFBox
> Issue Type: Bug
> Components: JempBox, PDModel
> Affects Versions: 1.8.3
> Environment: Not sure it matters, but Solaris (SunOS 5.10), java
> 1.6.0_19,
> Reporter: Pat Hickey
> Priority: Minor
>
> When the Metadata is encoded with the Crypt filter, exportMetadata() fails to
> parse the XML. My guess is that PDDocumentCatalog.getMetadata() gives
> PDMetadata the raw stream, instead of the filtered one. Then
> PDMetadata.exportXMPMetadata() calls XMPMetadata.load(), which cannot parse
> the encrypted stream.
> While I cannot post the document (proprietary), the outline shown by
> PDFDebugger goes like this:
> Root:Dictionary(Catalog)
> + AcroForm:Dictionary
> - Metadata:Stream(Metadata:XML)
> - Filter:Array
> o [0] Crypt
> o Length:6302
> o Subtype:XML
> o Type:Metadata
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)