[
https://issues.apache.org/jira/browse/TIKA-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17898240#comment-17898240
]
ASF GitHub Bot commented on TIKA-4350:
--------------------------------------
sebastian-nagel opened a new pull request, #2045:
URL: https://github.com/apache/tika/pull/2045
Trivial solution adding `<iframe>` as a `root-XML` hint, analogous to
`<frameset>`.
> HTML snippet containing <iframe> as root element erroneously recognized as
> application/xml
> ------------------------------------------------------------------------------------------
>
> Key: TIKA-4350
> URL: https://issues.apache.org/jira/browse/TIKA-4350
> Project: Tika
> Issue Type: Bug
> Components: detector, mime
> Affects Versions: 3.0.0
> Reporter: Sebastian Nagel
> Priority: Major
>
> A HTML snippet containing an <iframe> element as document root is erroneously
> recognized as \{{application/xml}}.
> This issue was reported on the Nutch user mailing list for Nutch 1.19 using
> Tika 2.3.0:
> [https://lists.apache.org/thread/fhhp1p6y4ttxmplvz1ohk3wwjz25ozbc]
> The problem is reproducible with Tika 3.0.0
--
This message was sent by Atlassian Jira
(v8.20.10#820010)