[ https://issues.apache.org/jira/browse/TIKA-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377749#comment-16377749 ]
Hudson commented on TIKA-2578: ------------------------------ SUCCESS: Integrated in Jenkins build Tika-trunk #1442 (See [https://builds.apache.org/job/Tika-trunk/1442/]) TIKA-2578 and TIKA-2587 -- Allow for RFC822 detection for files (tallison: [https://github.com/apache/tika/commit/8289d0b704514067a57928905b3654486b5831eb]) * (edit) tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java * (add) tika-parsers/src/test/resources/test-documents/testRFC822_dkim.eml * (edit) CHANGES.txt * (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml * (add) tika-parsers/src/test/resources/test-documents/testRFC822_x-.eml > Mails not recognized when unknown X-headers are present > ------------------------------------------------------- > > Key: TIKA-2578 > URL: https://issues.apache.org/jira/browse/TIKA-2578 > Project: Tika > Issue Type: Bug > Components: detector, mime > Affects Versions: 1.17, 1.18, 2.0.0 > Reporter: Andreas Meier > Assignee: Tim Allison > Priority: Major > Fix For: 1.18, 2.0.0 > > Attachments: testRFC822_with_leading_x_header > > > Found some mails with leading X-headers. > These mails are recognized as text/plain. > One example is CISCOs IronPort, which might add "X-IronPort-AV" to the > beginning of mails. > Therefore I would like to discuss if and how TIKA shall handle these cases. > In my opinion TIKA should try to detect files with x-headers and preprocess > them to get a valid mail. > Suggestion: > {code:xml} > <mime-type type="text/x-tika-x-header"> > <magic priority="50"> > <match value="X-" type="string" offset="0"> > <match value="Message-ID:" type="string" offset="0:8192"/> > <match value="From:" type="stringignorecase" offset="0:8192"/> > <match value="To:" type="stringignorecase" offset="0:8192"/> > <match value="Subject:" type="string" offset="0:8192"/> > <match value="MIME-Version:" type="stringignorecase" offset="0:8192"/> > </match> > </magic> > <sub-class-of type="text/x-tika-text-based-message"/> > </mime-type> > {code} > See also: [RFC6648|https://tools.ietf.org/html/rfc6648] > Attached an example file. > Regards > Andreas -- This message was sent by Atlassian JIRA (v7.6.3#76005)