Packiaraj Sakkanan created TIKA-3466:
----------------------------------------
Summary: Cannot detect mimetype of xhtml file when script is first
node instead of html
Key: TIKA-3466
URL: https://issues.apache.org/jira/browse/TIKA-3466
Project: Tika
Issue Type: Bug
Components: detector, mime
Affects Versions: 1.27
Reporter: Packiaraj Sakkanan
mime-type of below xhtml file deduced as 'application/xml' instead of
'application/xhtml+xml'
{code:java}
<?xml version="1.0" encoding="UTF-8" ?>
<script xmlns="http://www.w3.org/1999/xhtml"><![CDATA[
alert(555);
]]></script>
{code}
one possible solution is to add 'script' in tike-mimetypes.xml, like
{code:java}
<mime-type type="application/xhtml+xml">
<!-- The magic priority for xhtml+xml needs to be lower than that of -->
<!-- files that contain HTML within them, e.g. mime emails -->
<magic priority="40">
<match value="<html xmlns=" type="string" offset="0:8192"/>
</magic>
<root-XML namespaceURI="http://www.w3.org/1999/xhtml" localName="html"/>
<root-XML namespaceURI="http://www.w3.org/1999/xhtml" localName="script"/>
<glob pattern="*.xhtml"/>
<glob pattern="*.xht"/>
</mime-type>
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)