[ 
https://issues.apache.org/jira/browse/TIKA-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

August Valera updated TIKA-3666:
--------------------------------
    Description: 
Rights Management Service (RMS), implemented in MS Office as Information Rights 
Management (IRM), allows organizations to set file permissions that are stored 
within the file. In most cases, this will result in the file getting a new 
extension (with a prefix p, such as {{.txt}} becoming {{{}.ptxt{}}}), but in 
the case of MS Office and PDF files, which support this natively, the 
implementation results in the file contents being encrypted without any 
extension change. 
h4. Current behavior

Running such files through Tika produces results as if it was an empty file ran 
through {{DefaultParser}} and {{{}OfficeParser{}}}.
h4. Expected behavior

Extract more metadata about necessary permissions to view (if possible), and 
throwing {{EncryptedDocumentException}} as is the case with Office files 
encrypted in the more traditional manner.

Reference: 
[https://docs.microsoft.com/en-us/azure/information-protection/rms-client/clientv2-admin-guide-file-types#supported-file-types-for-classification-and-protection]

  was:
Rights Management Service (RMS), implemented in MS Office as Information Rights 
Management (IRM), allows organizations to set file permissions that are stored 
within the file. In most cases, this will result in the file getting a new 
extension (with a prefix p, such as {{.txt}} becoming {{{}.ptxt{}}}), but in 
the case of MS Office and PDF files, which support this natively, the 
implementation results in the file contents being encrypted without any 
extension change. 

Current behavior: Running such files through Tika produces results as if it was 
an empty file ran through {{DefaultParser}} and {{{}OfficeParser{}}}.

Expected behavior: Extract more metadata about necessary permissions to view 
(if possible), and throwing {{EncryptedDocumentException}} as is the case with 
Office files encrypted in the more traditional manner.

Reference: 
[https://docs.microsoft.com/en-us/azure/information-protection/rms-client/clientv2-admin-guide-file-types#supported-file-types-for-classification-and-protection]


> Detect and indicate file encrypted with Rights Management Service RMS/IRM
> -------------------------------------------------------------------------
>
>                 Key: TIKA-3666
>                 URL: https://issues.apache.org/jira/browse/TIKA-3666
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata
>            Reporter: August Valera
>            Priority: Major
>
> Rights Management Service (RMS), implemented in MS Office as Information 
> Rights Management (IRM), allows organizations to set file permissions that 
> are stored within the file. In most cases, this will result in the file 
> getting a new extension (with a prefix p, such as {{.txt}} becoming 
> {{{}.ptxt{}}}), but in the case of MS Office and PDF files, which support 
> this natively, the implementation results in the file contents being 
> encrypted without any extension change. 
> h4. Current behavior
> Running such files through Tika produces results as if it was an empty file 
> ran through {{DefaultParser}} and {{{}OfficeParser{}}}.
> h4. Expected behavior
> Extract more metadata about necessary permissions to view (if possible), and 
> throwing {{EncryptedDocumentException}} as is the case with Office files 
> encrypted in the more traditional manner.
> Reference: 
> [https://docs.microsoft.com/en-us/azure/information-protection/rms-client/clientv2-admin-guide-file-types#supported-file-types-for-classification-and-protection]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to