[
https://issues.apache.org/jira/browse/TIKA-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870127#action_12870127
]
Chris A. Mattmann commented on TIKA-391:
----------------------------------------
Hi Simon:
OK I read through the thread on this -- your patch is a bit outdated against
the current MimeTypes.java in the current trunk. Also, I think a better way to
achieve what you'd like to do without breaking back compat with
getMimeType(byte []) is to simply add a new method (or alternatively create a
new Detector) that combines the functionality of hinting, and that of magic
detection, which is essentially what you implemented.
Beyond that, we should put together a unit test to demonstrate the behavior
you're seeing (100x calls to tika.detect returns diff results on the same XLS
file).
Thoughts?
Cheers,
Chris
> Intermittent errors detecting xls files
> ---------------------------------------
>
> Key: TIKA-391
> URL: https://issues.apache.org/jira/browse/TIKA-391
> Project: Tika
> Issue Type: Bug
> Components: mime
> Affects Versions: 0.6
> Reporter: Simon Tyler
> Assignee: Chris A. Mattmann
> Attachments: MimeTypes.java
>
>
> I am doing some testing of Tika 0.6 and noticed some odd results for the
> testEXCEL.xls file included in the test suite.
> 100 calls to the following code:
>
> is = new BufferedInputStream(new FileInputStream(filename));
>
> Metadata metadata = new Metadata();
> metadata.set(Metadata.RESOURCE_NAME_KEY, filename);
>
> String type = tika.detect(is, metadata);
>
> Results in different matches as application/msword or
> application/vnd.ms-excel seemingly at random.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.