[ 
https://issues.apache.org/jira/browse/TIKA-2594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16390028#comment-16390028
 ] 

Luis Filipe Nassif commented on TIKA-2594:
------------------------------------------

We have used that magic restricted to 0:1000 for a long time, with very few 
false positives, along with:

{code}
<match value="\nReturn-Path:" type="stringignorecase" offset="0:1000"/>
 <match value="\nX-Originating-IP:" type="stringignorecase" offset="0:1000"/>
 <match value="\nReceived:" type="stringignorecase" offset="0:1000"/>
 <match value="\nMessage-ID:" type="stringignorecase" offset="0:1000"/>
{code}

> Mail detected as application/xhtml+xml
> --------------------------------------
>
>                 Key: TIKA-2594
>                 URL: https://issues.apache.org/jira/browse/TIKA-2594
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 2.0, 1.16, 1.17
>            Reporter: Andreas Meier
>            Priority: Major
>         Attachments: TestMail_inline_xhtml_plus_image.eml
>
>
> The attached mail (message/rfc822) with inline xhtml is recognized as 
> application/xhtml+xml
> Regards
> Andreas



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to