[
https://issues.apache.org/jira/browse/TIKA-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905066#comment-15905066
]
Tim Allison commented on TIKA-2294:
-----------------------------------
I concur w/ Nick. Two other things to try --
1) have you tried a more recent version of Tika (say, 1.14)?
2) Are there things inside the file that you can share with us:
2a) When you unzip the ooxml file, what are the file names?
2b) If there's a file named {{\[Content_Types\].xml}} can you share that
with us?
Finally,
{quote}
but for one off case it detects the same file as application/zip
{quote}
Just to confirm, you are _not_ saying that Tika detects {{file1.docx}} as ooxml
on one day but the exact same file as zip the next, right? What you're saying
is that for ~95% of your .docx files, Tika is identifying them as ooxml, but
for ~5% it is identifying them as zip...right?
> Tika inconsistently detects ooxml files as zip file sometimes
> -------------------------------------------------------------
>
> Key: TIKA-2294
> URL: https://issues.apache.org/jira/browse/TIKA-2294
> Project: Tika
> Issue Type: Bug
> Components: mime
> Affects Versions: 1.11
> Environment: linux
> Reporter: chanchal
>
> Tika sometimes incorrectly detects ooxml file as zip and sometimes correctly
> detects as docx/pptx/xlsx.
> Is there a possibility of it happening and how?
> I cannot share the file as it has sensitive content.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)