Pascal Essiembre created TIKA-2922:
--------------------------------------
Summary: Regression issue with detecting .dotx and .xlam MS Office
mime-types
Key: TIKA-2922
URL: https://issues.apache.org/jira/browse/TIKA-2922
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.22
Environment: N/A
Reporter: Pascal Essiembre
After upgrading to 1.22, .dotx and .xlam files are no longer detected properly.
They are now detected as:
{noformat}
.dotx -> vnd.ms-word.template.macroenabled.12
.xlam -> application/x-tika-ooxml{noformat}
They should be detected like they originally were:
{noformat}
.dotx -> vnd.openxmlformats-officedocument.wordprocessingml.template
.xlam -> application/vnd.ms-excel.addin.macroenabled.12{noformat}
Reference:
[https://docs.microsoft.com/en-us/previous-versions/office/office-2007-resource-kit/ee309278(v=office.12)]
It is happening in StreamingZipContainerDetector and ZipContainerDetectorBase.
I will submit a pull request shortly with the correct mapping.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)