[
https://issues.apache.org/jira/browse/TIKA-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Packiaraj Sakkanan updated TIKA-3466:
-------------------------------------
Description:
mime-type of below xhtml file deduced as 'application/xml' instead of
'application/xhtml+xml'
{code:java}
<?xml version="1.0" encoding="UTF-8" ?>
<script xmlns="http://www.w3.org/1999/xhtml"><![CDATA[
alert(555);
]]></script>
{code}
one possible solution is to add 'script' in tika-mimetypes.xml, like
{code:java}
<mime-type type="application/xhtml+xml">
<!-- The magic priority for xhtml+xml needs to be lower than that of -->
<!-- files that contain HTML within them, e.g. mime emails -->
<magic priority="40">
<match value="<html xmlns=" type="string" offset="0:8192"/>
</magic>
<root-XML namespaceURI="http://www.w3.org/1999/xhtml" localName="html"/>
<root-XML namespaceURI="http://www.w3.org/1999/xhtml" localName="script"/>
<glob pattern="*.xhtml"/>
<glob pattern="*.xht"/>
</mime-type>
{code}
was:
mime-type of below xhtml file deduced as 'application/xml' instead of
'application/xhtml+xml'
{code:java}
<?xml version="1.0" encoding="UTF-8" ?>
<script xmlns="http://www.w3.org/1999/xhtml"><![CDATA[
alert(555);
]]></script>
{code}
one possible solution is to add 'script' in tike-mimetypes.xml, like
{code:java}
<mime-type type="application/xhtml+xml">
<!-- The magic priority for xhtml+xml needs to be lower than that of -->
<!-- files that contain HTML within them, e.g. mime emails -->
<magic priority="40">
<match value="<html xmlns=" type="string" offset="0:8192"/>
</magic>
<root-XML namespaceURI="http://www.w3.org/1999/xhtml" localName="html"/>
<root-XML namespaceURI="http://www.w3.org/1999/xhtml" localName="script"/>
<glob pattern="*.xhtml"/>
<glob pattern="*.xht"/>
</mime-type>
{code}
> Cannot detect mimetype of xhtml file when script is first node instead of html
> ------------------------------------------------------------------------------
>
> Key: TIKA-3466
> URL: https://issues.apache.org/jira/browse/TIKA-3466
> Project: Tika
> Issue Type: Bug
> Components: detector, mime
> Affects Versions: 1.27
> Reporter: Packiaraj Sakkanan
> Priority: Major
>
> mime-type of below xhtml file deduced as 'application/xml' instead of
> 'application/xhtml+xml'
> {code:java}
> <?xml version="1.0" encoding="UTF-8" ?>
> <script xmlns="http://www.w3.org/1999/xhtml"><![CDATA[
> alert(555);
> ]]></script>
> {code}
>
> one possible solution is to add 'script' in tika-mimetypes.xml, like
> {code:java}
> <mime-type type="application/xhtml+xml">
> <!-- The magic priority for xhtml+xml needs to be lower than that of -->
> <!-- files that contain HTML within them, e.g. mime emails -->
> <magic priority="40">
> <match value="<html xmlns=" type="string" offset="0:8192"/>
> </magic>
> <root-XML namespaceURI="http://www.w3.org/1999/xhtml" localName="html"/>
> <root-XML namespaceURI="http://www.w3.org/1999/xhtml" localName="script"/>
> <glob pattern="*.xhtml"/>
> <glob pattern="*.xht"/>
> </mime-type>
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)