[ 
https://issues.apache.org/jira/browse/TIKA-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15904645#comment-15904645
 ] 

chanchal commented on TIKA-2294:
--------------------------------

Thanks Nick for the reply. We are using default tika config and have dependency 
on both tika core and tika parser libs. We are also supplying content of full 
file.
using following way

TikaInputStream tikaStream = TikaInputStream.get(inputstream);
new Defaultdetector().detect(tikaStream, new Metadata()).
so the problem is for a set of files Tika 95% or more detects correct mimetype 
but for one off case it detects the same file  as application/zip.

I tried 
https://wiki.apache.org/tika/Troubleshooting%20Tika#Identifying_what_Detectors_your_Tika_install_supports
 this and gives following detectors
POIFSContainerDetector
ZipContainerDetector
MimeTypes

So i think parser based detectors are available.




> Tika inconsistently detects ooxml files as zip file sometimes
> -------------------------------------------------------------
>
>                 Key: TIKA-2294
>                 URL: https://issues.apache.org/jira/browse/TIKA-2294
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.11
>         Environment: linux
>            Reporter: chanchal
>
> Tika sometimes incorrectly detects  ooxml file as zip and sometimes correctly 
> detects as docx/pptx/xlsx.
> Is there a possibility of it happening and how?
> I cannot share the file as it has sensitive content.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to