[
https://issues.apache.org/jira/browse/TIKA-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500986#comment-17500986
]
ASF GitHub Bot commented on TIKA-3687:
--------------------------------------
tballison merged pull request #520:
URL: https://github.com/apache/tika/pull/520
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> Email file detected as text/html
> --------------------------------
>
> Key: TIKA-3687
> URL: https://issues.apache.org/jira/browse/TIKA-3687
> Project: Tika
> Issue Type: Bug
> Affects Versions: 2.3.0
> Reporter: Thierry Guérin
> Priority: Minor
> Attachments: testRFC822-ARC.eml
>
>
> The attached email (which I redacted from a real email received from
> Office365) is detected a HTML.
> This is because it contains ARC * headers, but they're not the first one, so
> the matcher that looks for ARC headers fails, and the matcher for regular
> 'From' header also fails because the 'From' headers occurs after 1024
> characters.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)