[ 
https://issues.apache.org/jira/browse/TIKA-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995874#comment-13995874
 ] 

Phil Lester commented on TIKA-1296:
-----------------------------------

Hi Ken,
I think the option was added by TIKA-1146. I can't think of any good reason not 
to change them all -- it seems preferable to take that approach as there is 
always the possibility of someone accidentally changing the case on one or more 
of the tags. Thanks.

> Add case insensitive matching for text/html mime type
> -----------------------------------------------------
>
>                 Key: TIKA-1296
>                 URL: https://issues.apache.org/jira/browse/TIKA-1296
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>    Affects Versions: 1.5
>            Reporter: Phil Lester
>
> Currently in tika-mimetypes.xml for the mime type text/html (and possibly 
> others) matches in a couple different cases are provided for the elements so 
> that varying HTML writing styles are matched. As of version 1.5 of Tika the 
> ability exists to make these case insensitive using the "stringignorecase" 
> type. This would allow consolidation of some matches and improve detection of 
> poorly-formed HTML that would be rendered by most browsers regardless of case.
> For example:
>       <match value="&lt;BODY" type="string" offset="0"/>
>       <match value="&lt;body" type="string" offset="0"/>
> could become:
>       <match value="&lt;BODY" type="stringignorecase" offset="0"/>



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to