[ 
https://issues.apache.org/jira/browse/TIKA-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15924133#comment-15924133
 ] 

Tim Allison commented on TIKA-2294:
-----------------------------------

I put your code in a multithreaded unit test, and I'm not able to replicate 
this with 1.15-SNAPSHOT (trunk) or 1.11 
(63351d11c1778d66826693eb7a97114ab7342e78).  I tried 10 threads and 100 threads 
on a queue of 10000 .xlsx, docx, pptx files within our test set.  I also tried 
various thread counts against a single docx file.

{noformat}
TikaInputStream tikaStream = TikaInputStream.get(new 
FileInputStream(file));//don't do this in practice!!!
new Defaultdetector().detect(tikaStream, new Metadata()).
{noformat}

If you're able to share the names of the embedded files within your problematic 
files, that _might_ help.

> Tika inconsistently detects ooxml files as zip file sometimes
> -------------------------------------------------------------
>
>                 Key: TIKA-2294
>                 URL: https://issues.apache.org/jira/browse/TIKA-2294
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.11
>         Environment: linux
>            Reporter: chanchal
>
> Tika sometimes incorrectly detects  ooxml file as zip and sometimes correctly 
> detects as docx/pptx/xlsx.
> Is there a possibility of it happening and how?
> I cannot share the file as it has sensitive content.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to