[ https://issues.apache.org/jira/browse/TIKA-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17484907#comment-17484907 ]
August Valera commented on TIKA-3666: ------------------------------------- Apologies for not being able to provide a sample file at this time, I'm working to get a non-sensitive one (or several in the different office formats) available for testing. > Detect and indicate file encrypted with Rights Management Service RMS/IRM > ------------------------------------------------------------------------- > > Key: TIKA-3666 > URL: https://issues.apache.org/jira/browse/TIKA-3666 > Project: Tika > Issue Type: Improvement > Components: metadata > Reporter: August Valera > Priority: Major > > Rights Management Service (RMS), implemented in MS Office as Information > Rights Management (IRM), allows organizations to set file permissions that > are stored within the file. In most cases, this will result in the file > getting a new extension (with a prefix p, such as {{.txt}} becoming > {{{}.ptxt{}}}), but in the case of MS Office and PDF files, which support > this natively, the implementation results in the file contents being > encrypted without any extension change. > Current behavior: Running such files through Tika produces results as if it > was an empty file ran through {{DefaultParser}} and {{{}OfficeParser{}}}. > Expected behavior: Extract more metadata about necessary permissions to view > (if possible), and throwing {{EncryptedDocumentException}} as is the case > with Office files encrypted in the more traditional manner. > Reference: > [https://docs.microsoft.com/en-us/azure/information-protection/rms-client/clientv2-admin-guide-file-types#supported-file-types-for-classification-and-protection] -- This message was sent by Atlassian Jira (v8.20.1#820001)