[ https://issues.apache.org/jira/browse/TIKA-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris A. Mattmann resolved TIKA-1771. ------------------------------------- Resolution: Fixed Fix Version/s: 1.11 Thanks [~jeremybmerrill]! {noformat} [chipotle:~/tmp/tika1.11] mattmann% svn commit -m "Fix for TIKA-1771 lower magic priority xhtml magic priority to ensure emails detected as message/rfc822 contributed by Jeremy B. Merrill <jeremy.merr...@nytimes.com> this closes #58." Sending CHANGES.txt Sending tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml Transmitting file data .. Committed revision 1709301. [chipotle:~/tmp/tika1.11] mattmann% {noformat} > lower magic priority xhtml magic priority to ensure emails detected as > message/rfc822 > ------------------------------------------------------------------------------------- > > Key: TIKA-1771 > URL: https://issues.apache.org/jira/browse/TIKA-1771 > Project: Tika > Issue Type: Improvement > Components: detector > Reporter: Jeremy B. Merrill > Assignee: Chris A. Mattmann > Priority: Critical > Fix For: 1.11 > > > Emails I have (happy to share if you want) contain XHTML, as one part of a > multipart email. Prior to this pull request, the priority on the > application/xhtml+xml magic detector was 50, equal to the priority on the > message/rfc822 detector. Because of the relative position of the two > detectors in tika-mimetypes.xml, the emails were incorrectly detected as > XHTML documents. > With this PR, by downgrading the priority of application/xhtml+xml to 40, the > more-sensitive email magic detectors take precedence, causing the emails to > be properly detected as message/rfc822. > I have not run this thru the govdocs tester or anything other than my own > documents, so, full disclosure, this could cause false negative > xhtml-detections elsewhere. > I should note this occurs on trunk, from Github, up-to-date as of Tuesday-ish. -- This message was sent by Atlassian JIRA (v6.3.4#6332)