[
https://issues.apache.org/jira/browse/TIKA-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500990#comment-17500990
]
Tim Allison edited comment on TIKA-3687 at 3/3/22, 7:13 PM:
------------------------------------------------------------
Thank you [~tguerin]!
was (Author: [email protected]):
Thank you!
> Email file detected as text/html
> --------------------------------
>
> Key: TIKA-3687
> URL: https://issues.apache.org/jira/browse/TIKA-3687
> Project: Tika
> Issue Type: Bug
> Affects Versions: 2.3.0
> Reporter: Thierry Guérin
> Priority: Minor
> Fix For: 2.3.1
>
> Attachments: testRFC822-ARC.eml
>
>
> The attached email (which I redacted from a real email received from
> Office365) is detected a HTML.
> This is because it contains ARC * headers, but they're not the first one, so
> the matcher that looks for ARC headers fails, and the matcher for regular
> 'From' header also fails because the 'From' headers occurs after 1024
> characters.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)