[
https://issues.apache.org/jira/browse/TIKA-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972525#comment-16972525
]
Tim Allison commented on TIKA-2982:
-----------------------------------
Via Google translate:
>My guess is, is it because of the change of the header file when encrypting?
>I made a mistake, sorry, the format is: application/x-tika-msoffice. This
>itself represents the encrypted file format, I made it into a doc.
Our POIFS detector finds two streams: EncryptedPackage and EncryptionInfo. It
does not find a \u0006DataSpaces entry.
{noformat}
else if (names.contains("EncryptedPackage") &&
names.contains("EncryptionInfo") &&
names.contains("\u0006DataSpaces")) {
// This is a protected OOXML document, which is an OLE2 file
// with an Encrypted Stream which holds the OOXML data
// Without decrypting the stream, we can't tell what kind of
// OOXML file we have. Return a general OOXML Protected type,
// and hope the name based detection can guess the rest!
return OOXML_PROTECTED;
} else if (names.contains("EncryptedPackage")) {
return OLE;
{noformat}
Does anyone remember why we return OLE if there is no DataSpaces entry?
> Tika 识别已加密的xlsx、docx、pptx时会把它们错误地识别成doc
> ---------------------------------------
>
> Key: TIKA-2982
> URL: https://issues.apache.org/jira/browse/TIKA-2982
> Project: Tika
> Issue Type: Bug
> Components: detector
> Affects Versions: 1.20
> Reporter: Feng Jiao Jiang
> Priority: Blocker
> Attachments: 1.docx, 1.xlsx, 2.pptx
>
>
> Tika 识别已加密的xlsx、docx、pptx时会把它们错误地识别成doc
--
This message was sent by Atlassian Jira
(v8.3.4#803005)