[
https://issues.apache.org/jira/browse/TIKA-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15924133#comment-15924133
]
Tim Allison commented on TIKA-2294:
-----------------------------------
I put your code in a multithreaded unit test, and I'm not able to replicate
this with 1.15-SNAPSHOT (trunk) or 1.11
(63351d11c1778d66826693eb7a97114ab7342e78). I tried 10 threads and 100 threads
on a queue of 10000 .xlsx, docx, pptx files within our test set. I also tried
various thread counts against a single docx file.
{noformat}
TikaInputStream tikaStream = TikaInputStream.get(new
FileInputStream(file));//don't do this in practice!!!
new Defaultdetector().detect(tikaStream, new Metadata()).
{noformat}
If you're able to share the names of the embedded files within your problematic
files, that _might_ help.
> Tika inconsistently detects ooxml files as zip file sometimes
> -------------------------------------------------------------
>
> Key: TIKA-2294
> URL: https://issues.apache.org/jira/browse/TIKA-2294
> Project: Tika
> Issue Type: Bug
> Components: mime
> Affects Versions: 1.11
> Environment: linux
> Reporter: chanchal
>
> Tika sometimes incorrectly detects ooxml file as zip and sometimes correctly
> detects as docx/pptx/xlsx.
> Is there a possibility of it happening and how?
> I cannot share the file as it has sensitive content.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)