[ 
https://issues.apache.org/jira/browse/TIKA-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569353#comment-17569353
 ] 

Tristan Lins commented on TIKA-1141:
------------------------------------

Our customer must have used a build tool that removed all comments. So the 
{{/*! jQuery}} and {{sourceMappingURL}} comment no longer available.
Relying on commentary sequences is quite adventurous.
On the other hand, it is also difficult to rely on code sequences, which may 
look different afterwards depending on the transformer. 🤔

> javascript files that contain "<html" are detected as text/html
> ---------------------------------------------------------------
>
>                 Key: TIKA-1141
>                 URL: https://issues.apache.org/jira/browse/TIKA-1141
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.2
>            Reporter: David Hara
>            Priority: Minor
>         Attachments: jquery-2.0.3.js
>
>
> The Mimetypes detector will return text/html as the mimetype for any 
> javascript file that contains the string "<html" in it. I believe this is due 
> to the rule <match value="&lt;html" type="string" offset="0:8192"/> in the 
> tika-mimetypes.xml file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to