[
https://issues.apache.org/jira/browse/TIKA-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500953#comment-17500953
]
ASF GitHub Bot commented on TIKA-3687:
--------------------------------------
SchwingSK opened a new pull request #520:
URL: https://github.com/apache/tika/pull/520
1024 is maybe a bit overkill for the X|DKIM|ARC headers lookahead ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> Email file detected as text/html
> --------------------------------
>
> Key: TIKA-3687
> URL: https://issues.apache.org/jira/browse/TIKA-3687
> Project: Tika
> Issue Type: Bug
> Affects Versions: 2.3.0
> Reporter: Thierry Guérin
> Priority: Minor
> Attachments: testRFC822-ARC.eml
>
>
> The attached email (which I redacted from a real email received from
> Office365) is detected a HTML.
> This is because it contains ARC -* headers, but they're not the first one, so
> the matcher that looks for ARC- headers fails, and the matcher for regular
> 'From' header also fails because the 'From' headers occurs after 1024
> characters.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)