[ 
https://issues.apache.org/jira/browse/TIKA-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14962583#comment-14962583
 ] 

Hudson commented on TIKA-1771:
------------------------------

SUCCESS: Integrated in tika-trunk-jdk1.7 #872 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/872/])
Fix for TIKA-1771 lower magic priority xhtml magic priority to ensure emails 
detected as message/rfc822 contributed by Jeremy B. Merrill 
<[email protected]> this closes #58. (mattmann: 
[http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1709301])
* trunk/CHANGES.txt
* trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml


> lower magic priority xhtml magic priority to ensure emails detected as 
> message/rfc822
> -------------------------------------------------------------------------------------
>
>                 Key: TIKA-1771
>                 URL: https://issues.apache.org/jira/browse/TIKA-1771
>             Project: Tika
>          Issue Type: Improvement
>          Components: detector
>            Reporter: Jeremy B. Merrill
>            Assignee: Chris A. Mattmann
>            Priority: Critical
>             Fix For: 1.11
>
>
> Emails I have (happy to share if you want) contain XHTML, as one part of a 
> multipart email. Prior to this pull request, the priority on the 
> application/xhtml+xml magic detector was 50, equal to the priority on the 
> message/rfc822 detector. Because of the relative position of the two 
> detectors in tika-mimetypes.xml, the emails were incorrectly detected as 
> XHTML documents.
> With this PR, by downgrading the priority of application/xhtml+xml to 40, the 
> more-sensitive email magic detectors take precedence, causing the emails to 
> be properly detected as message/rfc822.
> I have not run this thru the govdocs tester or anything other than my own 
> documents, so, full disclosure, this could cause false negative 
> xhtml-detections elsewhere.
> I should note this occurs on trunk, from Github, up-to-date as of Tuesday-ish.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to