[ 
https://issues.apache.org/jira/browse/TIKA-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972525#comment-16972525
 ] 

Tim Allison commented on TIKA-2982:
-----------------------------------

Via Google translate:
>My guess is, is it because of the change of the header file when encrypting?
>I made a mistake, sorry, the format is: application/x-tika-msoffice. This 
>itself represents the encrypted file format, I made it into a doc.

Our POIFS detector finds two streams: EncryptedPackage and EncryptionInfo.  It 
does not find a \u0006DataSpaces entry.

{noformat}
else if (names.contains("EncryptedPackage") &&
                    names.contains("EncryptionInfo") &&
                    names.contains("\u0006DataSpaces")) {
                // This is a protected OOXML document, which is an OLE2 file
                //  with an Encrypted Stream which holds the OOXML data
                // Without decrypting the stream, we can't tell what kind of
                //  OOXML file we have. Return a general OOXML Protected type,
                //  and hope the name based detection can guess the rest!
                return OOXML_PROTECTED;
            } else if (names.contains("EncryptedPackage")) {
                return OLE;
{noformat}

Does anyone remember why we return OLE if there is no DataSpaces entry?

> Tika 识别已加密的xlsx、docx、pptx时会把它们错误地识别成doc
> ---------------------------------------
>
>                 Key: TIKA-2982
>                 URL: https://issues.apache.org/jira/browse/TIKA-2982
>             Project: Tika
>          Issue Type: Bug
>          Components: detector
>    Affects Versions: 1.20
>            Reporter: Feng Jiao Jiang
>            Priority: Blocker
>         Attachments: 1.docx, 1.xlsx, 2.pptx
>
>
> Tika 识别已加密的xlsx、docx、pptx时会把它们错误地识别成doc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to