[ https://issues.apache.org/jira/browse/TIKA-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294763#comment-16294763 ]
Nick Burch commented on TIKA-1141: ---------------------------------- That might not pass in the filename, but I'm not sure on the GUI part of the Tika app... [~talli...@mitre.org] any idea? In general though, there's no unique file magic for JS files, so we (along with most other tools) require the filename as an additional hint to get it right > javascript files that contain "<html" are detected as text/html > --------------------------------------------------------------- > > Key: TIKA-1141 > URL: https://issues.apache.org/jira/browse/TIKA-1141 > Project: Tika > Issue Type: Bug > Components: mime > Affects Versions: 1.2 > Reporter: David Hara > Priority: Minor > > The Mimetypes detector will return text/html as the mimetype for any > javascript file that contains the string "<html" in it. I believe this is due > to the rule <match value="<html" type="string" offset="0:8192"/> in the > tika-mimetypes.xml file. -- This message was sent by Atlassian JIRA (v6.4.14#64029)