[ https://issues.apache.org/jira/browse/TIKA-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500990#comment-17500990 ]
Tim Allison edited comment on TIKA-3687 at 3/3/22, 7:13 PM: ------------------------------------------------------------ Thank you [~tguerin]! was (Author: talli...@mitre.org): Thank you! > Email file detected as text/html > -------------------------------- > > Key: TIKA-3687 > URL: https://issues.apache.org/jira/browse/TIKA-3687 > Project: Tika > Issue Type: Bug > Affects Versions: 2.3.0 > Reporter: Thierry Guérin > Priority: Minor > Fix For: 2.3.1 > > Attachments: testRFC822-ARC.eml > > > The attached email (which I redacted from a real email received from > Office365) is detected a HTML. > This is because it contains ARC * headers, but they're not the first one, so > the matcher that looks for ARC headers fails, and the matcher for regular > 'From' header also fails because the 'From' headers occurs after 1024 > characters. -- This message was sent by Atlassian Jira (v8.20.1#820001)