[jira] [Commented] (TIKA-3153) Text File identified as message/rfc822

Kenneth William Krugler (Jira) Mon, 10 Aug 2020 11:06:09 -0700


    [ 
https://issues.apache.org/jira/browse/TIKA-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174976#comment-17174976
 ]


Kenneth William Krugler commented on TIKA-3153:
-----------------------------------------------

I think that for many text-based formats, we'll eventually need to support 
dedicated detectors (probably triggered off some regex match), to reduce an 
increasing number of false positives as we support more formats. Otherwise 
every change to the mime type matchers has significant chance of introducing a 
regression.

As to priority, got me :)

> Text File identified as message/rfc822
> --------------------------------------
>
>                 Key: TIKA-3153
>                 URL: https://issues.apache.org/jira/browse/TIKA-3153
>             Project: Tika
>          Issue Type: Bug
>          Components: detector
>    Affects Versions: 1.24.1
>            Reporter: Akash
>            Priority: Major
>         Attachments: TextFileIdentifiedAsMessage.txt
>
>
> Text file containing the word Received: is identified as message/rfc22.
> We were earlier using version 1.9 and it used to identify file type properly 
> as text/plain.
> Even if multiple lines are there, if one line with Received: is present, 
> content type is incorrectly identified.
> To check we can run java -jar tika-app-1.24.1.jar 
> TextFileIdentifiedAsMessage.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (TIKA-3153) Text File identified as message/rfc822

Reply via email to