[ 
https://issues.apache.org/jira/browse/TIKA-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377749#comment-16377749
 ] 

Hudson commented on TIKA-2578:
------------------------------

SUCCESS: Integrated in Jenkins build Tika-trunk #1442 (See 
[https://builds.apache.org/job/Tika-trunk/1442/])
 TIKA-2578 and TIKA-2587 -- Allow for RFC822 detection for files (tallison: 
[https://github.com/apache/tika/commit/8289d0b704514067a57928905b3654486b5831eb])
* (edit) tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java
* (add) tika-parsers/src/test/resources/test-documents/testRFC822_dkim.eml
* (edit) CHANGES.txt
* (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
* (add) tika-parsers/src/test/resources/test-documents/testRFC822_x-.eml


> Mails not recognized when unknown X-headers are present
> -------------------------------------------------------
>
>                 Key: TIKA-2578
>                 URL: https://issues.apache.org/jira/browse/TIKA-2578
>             Project: Tika
>          Issue Type: Bug
>          Components: detector, mime
>    Affects Versions: 1.17, 1.18, 2.0.0
>            Reporter: Andreas Meier
>            Assignee: Tim Allison
>            Priority: Major
>             Fix For: 1.18, 2.0.0
>
>         Attachments: testRFC822_with_leading_x_header
>
>
> Found some mails with leading X-headers.
> These mails are recognized as text/plain.
> One example is CISCOs IronPort, which might add "X-IronPort-AV" to the 
> beginning of mails.
> Therefore I would like to discuss if and how TIKA shall handle these cases.
> In my opinion TIKA should try to detect files with x-headers and preprocess 
> them to get a valid mail.
> Suggestion:
> {code:xml}
> <mime-type type="text/x-tika-x-header">
>   <magic priority="50">
>     <match value="X-" type="string" offset="0">
>       <match value="Message-ID:" type="string" offset="0:8192"/>
>       <match value="From:" type="stringignorecase" offset="0:8192"/>
>       <match value="To:" type="stringignorecase" offset="0:8192"/>
>       <match value="Subject:" type="string" offset="0:8192"/>
>       <match value="MIME-Version:" type="stringignorecase" offset="0:8192"/>
>     </match>
>   </magic>
>   <sub-class-of type="text/x-tika-text-based-message"/>
> </mime-type>
> {code}
> See also: [RFC6648|https://tools.ietf.org/html/rfc6648]
> Attached an example file.
> Regards
> Andreas



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to