[ https://issues.apache.org/jira/browse/TIKA-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525134#comment-17525134 ]
Tim Allison commented on TIKA-3666: ----------------------------------- I made this change locally and ran it against our msoffice files in our regression set. I didn't see any new exceptions: https://corpora.tika.apache.org/base/reports/reports-20220419-msoffice.tgz I think we should do this as a last resort if there's an EncryptedPackage that we haven't yet identified, and if we haven't already identified the OLE2 type. > Detect and indicate file encrypted with Rights Management Service RMS/IRM > ------------------------------------------------------------------------- > > Key: TIKA-3666 > URL: https://issues.apache.org/jira/browse/TIKA-3666 > Project: Tika > Issue Type: Improvement > Components: metadata > Reporter: August Valera > Priority: Major > Attachments: poifsviewer.txt, sam-poifsviewer.txt > > > Rights Management Service (RMS), implemented in MS Office as Information > Rights Management (IRM), allows organizations to set file permissions that > are stored within the file. In most cases, this will result in the file > getting a new extension (with a prefix p, such as {{.txt}} becoming > {{{}.ptxt{}}}), but in the case of MS Office and PDF files, which support > this natively, the implementation results in the file contents being > encrypted without any extension change. > h4. Current behavior > Running such files through Tika produces results as if it was an empty file > ran through {{DefaultParser}} and {{{}OfficeParser{}}}. > h4. Expected behavior > Extract more metadata about necessary permissions to view (if possible), and > throwing {{EncryptedDocumentException}} as is the case with Office files > encrypted in the more traditional manner. > Reference: > [https://docs.microsoft.com/en-us/azure/information-protection/rms-client/clientv2-admin-guide-file-types#supported-file-types-for-classification-and-protection] -- This message was sent by Atlassian Jira (v8.20.7#820007)