[ 
https://issues.apache.org/jira/browse/TIKA-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15130723#comment-15130723
 ] 

Nick Burch commented on TIKA-1141:
----------------------------------

I've tweaked the mime magic for HTML, so we give <html a lower priority if it 
isn't near the start. As long as the .js filename is given, Tika is able to 
correctly identify these JQuery files as application/javascript now. Without 
the filename it can't, as we don't have any javascript magic. Not sure if we 
could add any either, given the format, but if someone wants to take a stab 
that'd be great!

> javascript files that contain "<html" are detected as text/html
> ---------------------------------------------------------------
>
>                 Key: TIKA-1141
>                 URL: https://issues.apache.org/jira/browse/TIKA-1141
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.2
>            Reporter: David Hara
>            Priority: Minor
>
> The Mimetypes detector will return text/html as the mimetype for any 
> javascript file that contains the string "<html" in it. I believe this is due 
> to the rule <match value="&lt;html" type="string" offset="0:8192"/> in the 
> tika-mimetypes.xml file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to