[ https://issues.apache.org/jira/browse/TIKA-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803215#comment-17803215 ]
Tim Allison commented on TIKA-4175: ----------------------------------- Thank you for sharing a file. I didn't understand from your description where exactly the Adobe.APS filter lived. My fault, not yours...description is perfectly clear now that I reread it. Now that I understand this...yuck. The good news is that when you use the /rmeta endpoint, the -J option with tika-app or the RecursiveParserWrapper, there is an EncryptedDocumentException in the embedded file, as you'd expect. The file on TIKA-4082 uses the /AFRelationship: EncryptedPayload in the container file to signify in the container file that there's an encrypted file embedded within the PDF. In the PDF on this issue, I can't find any sign of the encrypted file in the container file. As you point out, this /EmbeddedFiles is totally non-descript. And, as you point out, the only sign of the encryption is inside the embedded file, which contains the CryptFilter etc. If the goal is to throw an EncryptedDocumentException at the container document level, I can only think of the following: The embedded file handler needs to check for and then pass on an EncryptedDocumentException, but only when it is caused by an IOException at the PDFBox level with message containing "No security handler for filter..." Any other options? > Additional IRM-protected PDFs should throw EncryptedDocumentException > --------------------------------------------------------------------- > > Key: TIKA-4175 > URL: https://issues.apache.org/jira/browse/TIKA-4175 > Project: Tika > Issue Type: Bug > Reporter: Ross Johnson > Priority: Major > Attachments: image-2023-12-20-17-06-29-791.png, > image-2023-12-20-17-12-09-946.png > > > I've come across some PDFs that use an Adobe IRM scheme, similar to > TIKA-4082, where a wrapper PDF contains an IRM-protected embedded PDF. These > wrapper PDFs do not currently throw because the structure is a bit different > than what is currently being looked for in PDFParser#checkEncryptedPayload(). > As best I can tell, this form of IRM was implemented by Adobe, but is > licensed to 3rd parties who then can market it as their own form of PDF > protection. The documents I've seen are from an IRM product from Interlinks, > but there are likely very similarly protected PDFs from other products. > Opening the wrapper PDF in Adobe Reader / Acrobat prompts for a server > authentication (shown below). Opening in other viewers shows the wrapper > splash page, which indicates that the viewer is not secure and to use Adobe > Reader. !image-2023-12-20-17-06-29-791.png! > The wrapper PDFs I've seen use PDF version 1.4 and have a somewhat generic > /EmbeddedFiles dictionary: !image-2023-12-20-17-12-09-946.png! > The encrypted PDF payloads I've seen have a somewhat interesting /Encrypt > dictionary with a Filter value of "Adobe.APS". > {code:java} > << > /EDCData (...base64 string...) > /CF << > /DefaultCryptFilter<</CFM/AESV3/Length 256>> > >> > /PDRLLic (...base64 string...) > /R 65537 > /StmF /DefaultCryptFilter > /Filter /Adobe.APS > /EncryptMetadata true > /V 5 > /StrF /DefaultCryptFilter > /PDRLPol (...base64 string...) > /SubFilter /adobe.pdrl.v0 > >> > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)