[ 
https://issues.apache.org/jira/browse/TIKA-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174976#comment-17174976
 ] 

Kenneth William Krugler commented on TIKA-3153:
-----------------------------------------------

I think that for many text-based formats, we'll eventually need to support 
dedicated detectors (probably triggered off some regex match), to reduce an 
increasing number of false positives as we support more formats. Otherwise 
every change to the mime type matchers has significant chance of introducing a 
regression.

As to priority, got me :)

> Text File identified as message/rfc822
> --------------------------------------
>
>                 Key: TIKA-3153
>                 URL: https://issues.apache.org/jira/browse/TIKA-3153
>             Project: Tika
>          Issue Type: Bug
>          Components: detector
>    Affects Versions: 1.24.1
>            Reporter: Akash
>            Priority: Major
>         Attachments: TextFileIdentifiedAsMessage.txt
>
>
> Text file containing the word Received: is identified as message/rfc22.
> We were earlier using version 1.9 and it used to identify file type properly 
> as text/plain.
> Even if multiple lines are there, if one line with Received: is present, 
> content type is incorrectly identified.
> To check we can run java -jar tika-app-1.24.1.jar 
> TextFileIdentifiedAsMessage.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to