[ 
https://issues.apache.org/jira/browse/TIKA-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377508#comment-17377508
 ] 

Packiaraj Sakkanan commented on TIKA-3466:
------------------------------------------

Hi [~nick],
We are having problem with allow-list. We need to accept XML as valid input. In 
this case, a XHTML is detected as XML thus passing through as valid file. ( 
eventully browser executes that XHTML file) 

If xmlns in root-element is distincitve  engough to differentiate between XML & 
XHTML, then adding that will help secure many applications. 

> Cannot detect mimetype of xhtml file when script is first node instead of html
> ------------------------------------------------------------------------------
>
>                 Key: TIKA-3466
>                 URL: https://issues.apache.org/jira/browse/TIKA-3466
>             Project: Tika
>          Issue Type: Bug
>          Components: detector, mime
>    Affects Versions: 1.27
>            Reporter: Packiaraj Sakkanan
>            Priority: Major
>
> mime-type of below xhtml file deduced as 'application/xml' instead of 
> 'application/xhtml+xml' 
> {code:java}
> <?xml version="1.0" encoding="UTF-8" ?>
> <script xmlns="http://www.w3.org/1999/xhtml";><![CDATA[
>   alert(555);
>   ]]></script>
> {code}
>  
>  one possible solution is to add 'script' in tika-mimetypes.xml, like 
> {code:java}
> <mime-type type="application/xhtml+xml">
>   <!-- The magic priority for xhtml+xml needs to be lower than that of -->
>   <!--  files that contain HTML within them, e.g. mime emails -->
>   <magic priority="40">
>     <match value="&lt;html xmlns=" type="string" offset="0:8192"/>
>   </magic>
>   <root-XML namespaceURI="http://www.w3.org/1999/xhtml"; localName="html"/>
>   <root-XML namespaceURI="http://www.w3.org/1999/xhtml"; localName="script"/>
>   <glob pattern="*.xhtml"/>
>   <glob pattern="*.xht"/>
> </mime-type>
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to