[ 
https://issues.apache.org/jira/browse/TIKA-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924507#action_12924507
 ] 

Nick Burch commented on TIKA-391:
---------------------------------

application/x-tika-msoffice is to be expected in many cases when using the mime 
magic detector on an OLE2 file with no filename specified. You just can't 
reliably detect which kind of OLE2 based file format it is from the first few 
kb.

You should really use the ContainerAwareDetector if you don't know the 
filename, as it is able to open the OLE2 container and figure out the type from 
the contents.


> Intermittent errors detecting xls files
> ---------------------------------------
>
>                 Key: TIKA-391
>                 URL: https://issues.apache.org/jira/browse/TIKA-391
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 0.6
>            Reporter: Simon Tyler
>            Assignee: Chris A. Mattmann
>             Fix For: 0.8
>
>         Attachments: MimeTypes.java
>
>
> I am doing some testing of Tika 0.6 and noticed some odd results for the 
> testEXCEL.xls file included in the test suite. 
> 100 calls to the following code:
>  
>             is = new BufferedInputStream(new FileInputStream(filename));
>  
>             Metadata metadata = new Metadata();
>             metadata.set(Metadata.RESOURCE_NAME_KEY, filename);
>  
>             String type = tika.detect(is, metadata);
>  
> Results in different matches as application/msword or 
> application/vnd.ms-excel seemingly at random.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to