[ 
https://issues.apache.org/jira/browse/TIKA-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905066#comment-15905066
 ] 

Tim Allison commented on TIKA-2294:
-----------------------------------

I concur w/ Nick.  Two other things to try -- 

1) have you tried a more recent version of Tika (say, 1.14)? 
2) Are there things inside the file that you can share with us: 
    2a) When you unzip the ooxml file, what are the file names?  
    2b) If there's a file named {{\[Content_Types\].xml}} can you share that 
with us?

Finally,
{quote}
but for one off case it detects the same file as application/zip
{quote}
Just to confirm, you are _not_ saying that Tika detects {{file1.docx}} as ooxml 
on one day but the exact same file as zip the next, right?  What you're saying 
is that for ~95% of your .docx files, Tika is identifying them as ooxml, but 
for ~5% it is identifying them as zip...right?

> Tika inconsistently detects ooxml files as zip file sometimes
> -------------------------------------------------------------
>
>                 Key: TIKA-2294
>                 URL: https://issues.apache.org/jira/browse/TIKA-2294
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.11
>         Environment: linux
>            Reporter: chanchal
>
> Tika sometimes incorrectly detects  ooxml file as zip and sometimes correctly 
> detects as docx/pptx/xlsx.
> Is there a possibility of it happening and how?
> I cannot share the file as it has sensitive content.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to