[ https://issues.apache.org/jira/browse/TIKA-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802344#comment-17802344 ]
Ross Johnson commented on TIKA-4175: ------------------------------------ Hi Tim, I shared a file via email. In case it is helpful, I also did a bit of investigating of how / when Acrobat will attempt to immediately show the encrypted payload PDF instead of the wrapper document. I was disappointed to learn about the */Root* -> */Collection* -> */D* dictionary property (described in Table 155 of PDF spec), which may contain the name of a file in the *EmbeddedFiles* name tree which the viewer is supposed to show as the initial document instead of the actual initial document. Removing or changing the name of this *D* property with my sample file causes Acrobat to just show the single "Please use Acrobat" page of the wrapper document. > Additional IRM-protected PDFs should throw EncryptedDocumentException > --------------------------------------------------------------------- > > Key: TIKA-4175 > URL: https://issues.apache.org/jira/browse/TIKA-4175 > Project: Tika > Issue Type: Bug > Reporter: Ross Johnson > Priority: Major > Attachments: image-2023-12-20-17-06-29-791.png, > image-2023-12-20-17-12-09-946.png > > > I've come across some PDFs that use an Adobe IRM scheme, similar to > TIKA-4082, where a wrapper PDF contains an IRM-protected embedded PDF. These > wrapper PDFs do not currently throw because the structure is a bit different > than what is currently being looked for in PDFParser#checkEncryptedPayload(). > As best I can tell, this form of IRM was implemented by Adobe, but is > licensed to 3rd parties who then can market it as their own form of PDF > protection. The documents I've seen are from an IRM product from Interlinks, > but there are likely very similarly protected PDFs from other products. > Opening the wrapper PDF in Adobe Reader / Acrobat prompts for a server > authentication (shown below). Opening in other viewers shows the wrapper > splash page, which indicates that the viewer is not secure and to use Adobe > Reader. !image-2023-12-20-17-06-29-791.png! > The wrapper PDFs I've seen use PDF version 1.4 and have a somewhat generic > /EmbeddedFiles dictionary: !image-2023-12-20-17-12-09-946.png! > The encrypted PDF payloads I've seen have a somewhat interesting /Encrypt > dictionary with a Filter value of "Adobe.APS". > {code:java} > << > /EDCData (...base64 string...) > /CF << > /DefaultCryptFilter<</CFM/AESV3/Length 256>> > >> > /PDRLLic (...base64 string...) > /R 65537 > /StmF /DefaultCryptFilter > /Filter /Adobe.APS > /EncryptMetadata true > /V 5 > /StrF /DefaultCryptFilter > /PDRLPol (...base64 string...) > /SubFilter /adobe.pdrl.v0 > >> > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)