[ 
https://issues.apache.org/jira/browse/PDFBOX-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pat Hickey updated PDFBOX-1872:
-------------------------------

    Description: 
When the Metadata is encoded with the Crypt filter, exportMetadata() fails to 
parse the XML. My guess is that PDDocumentCatalog.getMetadata() gives 
PDMetadata the raw stream, instead of the filtered one. Then 
PDMetadata.exportXMPMetadata() calls XMPMetadata.load(), which cannot parse the 
encrypted stream.  
While I cannot post the document (proprietary), the outline shown by 
PDFDebugger goes like this:
Root:Dictionary(Catalog)
+ AcroForm:Dictionary
- Metadata:Stream(Metadata:XML)
 - Filter:Array
     o [0] Crypt
  o Length:6302
  o Subtype:XML
  o Type:Metadata



  was:
My guess is that PDDocumentCatalog.getMetadata() gives PDMetadata the raw 
stream, instead of the filtered one. Then PDMetadata.exportXMPMetadata() calls 
XMPMetadata.load(), which cannot parse the encrypted stream.  
As a workaround, this seems to do the trick (where document is the PDDocument 
loaded from the PDF):
    String content = null;
    COSStream md = 
(COSStream)document.getDocument().getCatalog().getDictionaryObject( 
COSName.METADATA );
    if ( md != null ) {
        PDStream pd = new PDStream( md );
        content = pd.getInputStreamAsString();
    }



> PDMetadata.exportXMPMetadata fails when Metadata has encrypted stream
> ---------------------------------------------------------------------
>
>                 Key: PDFBOX-1872
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1872
>             Project: PDFBox
>          Issue Type: Bug
>          Components: JempBox, PDModel
>    Affects Versions: 1.8.3
>         Environment: Not sure it matters, but Solaris (SunOS 5.10), java 
> 1.6.0_19,
>            Reporter: Pat Hickey
>            Priority: Minor
>
> When the Metadata is encoded with the Crypt filter, exportMetadata() fails to 
> parse the XML. My guess is that PDDocumentCatalog.getMetadata() gives 
> PDMetadata the raw stream, instead of the filtered one. Then 
> PDMetadata.exportXMPMetadata() calls XMPMetadata.load(), which cannot parse 
> the encrypted stream.  
> While I cannot post the document (proprietary), the outline shown by 
> PDFDebugger goes like this:
> Root:Dictionary(Catalog)
> + AcroForm:Dictionary
> - Metadata:Stream(Metadata:XML)
>  - Filter:Array
>      o [0] Crypt
>   o Length:6302
>   o Subtype:XML
>   o Type:Metadata



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to