Packiaraj Sakkanan created TIKA-3466:
----------------------------------------

             Summary: Cannot detect mimetype of xhtml file when script is first 
node instead of html
                 Key: TIKA-3466
                 URL: https://issues.apache.org/jira/browse/TIKA-3466
             Project: Tika
          Issue Type: Bug
          Components: detector, mime
    Affects Versions: 1.27
            Reporter: Packiaraj Sakkanan


mime-type of below xhtml file deduced as 'application/xml' instead of 
'application/xhtml+xml' 


{code:java}
<?xml version="1.0" encoding="UTF-8" ?>
<script xmlns="http://www.w3.org/1999/xhtml";><![CDATA[
  alert(555);
  ]]></script>
{code}
 

 one possible solution is to add 'script' in tike-mimetypes.xml, like 


{code:java}
<mime-type type="application/xhtml+xml">
  <!-- The magic priority for xhtml+xml needs to be lower than that of -->
  <!--  files that contain HTML within them, e.g. mime emails -->
  <magic priority="40">
    <match value="&lt;html xmlns=" type="string" offset="0:8192"/>
  </magic>
  <root-XML namespaceURI="http://www.w3.org/1999/xhtml"; localName="html"/>
  <root-XML namespaceURI="http://www.w3.org/1999/xhtml"; localName="script"/>
  <glob pattern="*.xhtml"/>
  <glob pattern="*.xht"/>
</mime-type>
{code}




 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to