[ 
https://issues.apache.org/jira/browse/TIKA-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905228#comment-15905228
 ] 

chanchal commented on TIKA-2294:
--------------------------------

Thanks Tim and Nick for looking into this.

What i meant was that same file is returning docx and zip. so if detection is 
happening 20 times for same file, then 19 times it returns docx but one time it 
returns zip. And this behaviour is happening only for small number of ooxml 
files.

So we have tika deployed on multiple machines and on one of the setup we 
receives zip as detected mimetype. And each time when zip is getting detected, 
machine is not same. so does not look like machine issue.

Although i checked online about thread safety of Tika, but want to confirm once 
again: is detector thread safe?

Related to file, i will check and get back, if I can share.

thanks, 



> Tika inconsistently detects ooxml files as zip file sometimes
> -------------------------------------------------------------
>
>                 Key: TIKA-2294
>                 URL: https://issues.apache.org/jira/browse/TIKA-2294
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.11
>         Environment: linux
>            Reporter: chanchal
>
> Tika sometimes incorrectly detects  ooxml file as zip and sometimes correctly 
> detects as docx/pptx/xlsx.
> Is there a possibility of it happening and how?
> I cannot share the file as it has sensitive content.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to