[ 
https://issues.apache.org/jira/browse/TIKA-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17501040#comment-17501040
 ] 

Hudson commented on TIKA-3687:
------------------------------

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #475 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/475/])
[TIKA-3687] Fix email detection (#520) (github: 
[https://github.com/apache/tika/commit/5444f80d1b71845ff47c91376f5c90a40dae5a4f])
* (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
* (add) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-mail-module/src/test/resources/test-documents/testRFC822-ARC
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-mail-module/src/test/java/org/apache/tika/parser/mail/RFC822ParserTest.java


> Email file detected as text/html
> --------------------------------
>
>                 Key: TIKA-3687
>                 URL: https://issues.apache.org/jira/browse/TIKA-3687
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 2.3.0
>            Reporter: Thierry Guérin
>            Priority: Minor
>             Fix For: 2.3.1
>
>         Attachments: testRFC822-ARC.eml
>
>
> The attached email (which I redacted from a real email received from 
> Office365) is detected a HTML.
> This is because it contains ARC * headers, but they're not the first one, so 
> the matcher that looks for ARC headers fails, and the matcher for regular 
> 'From' header also fails because the 'From' headers occurs after 1024 
> characters.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to