[
https://issues.apache.org/jira/browse/TIKA-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15904645#comment-15904645
]
chanchal commented on TIKA-2294:
--------------------------------
Thanks Nick for the reply. We are using default tika config and have dependency
on both tika core and tika parser libs. We are also supplying content of full
file.
using following way
TikaInputStream tikaStream = TikaInputStream.get(inputstream);
new Defaultdetector().detect(tikaStream, new Metadata()).
so the problem is for a set of files Tika 95% or more detects correct mimetype
but for one off case it detects the same file as application/zip.
I tried
https://wiki.apache.org/tika/Troubleshooting%20Tika#Identifying_what_Detectors_your_Tika_install_supports
this and gives following detectors
POIFSContainerDetector
ZipContainerDetector
MimeTypes
So i think parser based detectors are available.
> Tika inconsistently detects ooxml files as zip file sometimes
> -------------------------------------------------------------
>
> Key: TIKA-2294
> URL: https://issues.apache.org/jira/browse/TIKA-2294
> Project: Tika
> Issue Type: Bug
> Components: mime
> Affects Versions: 1.11
> Environment: linux
> Reporter: chanchal
>
> Tika sometimes incorrectly detects ooxml file as zip and sometimes correctly
> detects as docx/pptx/xlsx.
> Is there a possibility of it happening and how?
> I cannot share the file as it has sensitive content.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)