[
https://issues.apache.org/jira/browse/TIKA-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174976#comment-17174976
]
Kenneth William Krugler commented on TIKA-3153:
-----------------------------------------------
I think that for many text-based formats, we'll eventually need to support
dedicated detectors (probably triggered off some regex match), to reduce an
increasing number of false positives as we support more formats. Otherwise
every change to the mime type matchers has significant chance of introducing a
regression.
As to priority, got me :)
> Text File identified as message/rfc822
> --------------------------------------
>
> Key: TIKA-3153
> URL: https://issues.apache.org/jira/browse/TIKA-3153
> Project: Tika
> Issue Type: Bug
> Components: detector
> Affects Versions: 1.24.1
> Reporter: Akash
> Priority: Major
> Attachments: TextFileIdentifiedAsMessage.txt
>
>
> Text file containing the word Received: is identified as message/rfc22.
> We were earlier using version 1.9 and it used to identify file type properly
> as text/plain.
> Even if multiple lines are there, if one line with Received: is present,
> content type is incorrectly identified.
> To check we can run java -jar tika-app-1.24.1.jar
> TextFileIdentifiedAsMessage.txt
--
This message was sent by Atlassian Jira
(v8.3.4#803005)