[
https://issues.apache.org/jira/browse/TIKA-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16404356#comment-16404356
]
pdwalker commented on TIKA-2608:
--------------------------------
1.18 snapshot: {color:#FF0000}failure{color}
{{*$ java -jar tika-app-1.18-SNAPSHOT.jar -d mxGraphEditor.min.js*}}
{{Mar 19, 2018 12:11:48 PM org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem}}
{{WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored}}
{{See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io}}
{{for optional dependencies.}}
{{J2KImageReader not loaded. JPEG2000 files will not be processed.}}
{{See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io}}
{{for optional dependencies.}}{{Mar 19, 2018 12:11:48 PM
org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem}}
{{WARNING: org.xerial's sqlite-jdbc is not loaded.}}
{{Please provide the jar on your classpath to parse sqlite files.}}
{{See tika-parsers/pom.xml for the correct version.}}
{{*text/x-matlab*}}
> tika matlab parser incorrectly identifies content type of minified javascript
> file
> ----------------------------------------------------------------------------------
>
> Key: TIKA-2608
> URL: https://issues.apache.org/jira/browse/TIKA-2608
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.17
> Environment: * xwiki 10.1,
> * Tomcat 8 (8.0.32-1ubuntu1)
> * Ubuntu 16.04.4 LTS
> * Oracle Java 1.8.0_161-b12
> Reporter: pdwalker
> Priority: Minor
> Fix For: 2.0.0
>
>
> When the tika "detects" the following file, it returns the wrong content type:
> {{$ curl -I
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]}}
> {{HTTP/1.1 200 OK}}
> {{Server: nginx/1.10.3 (Ubuntu)}}
> {{Date: Fri, 16 Mar 2018 10:09:54 GMT}}
> {{Content-Type: text/x-matlab}}
> {{ [snip]}}
> {{X-Frame-Options: SAMEORIGIN}}
> However, the unminified version of the same file returns the correct type:
> {{$ curl -I
> [https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.js]}}
> {{HTTP/1.1 200 OK}}
> {{Server: nginx/1.10.3 (Ubuntu)}}
> {{Date: Fri, 16 Mar 2018 10:10:25 GMT}}
> {{Content-Type: application/javascript}}
> {{ [snip]}}
> {{X-Frame-Options: SAMEORIGIN}}
> The problem this causes is when my xwiki installation is behind an ssl proxy
> (nginx) and I enable the add_header X-Content-Type-Options nosniff; header.
> Modern browsers return the following error:
> {quote}Refused to execute script from
> '[https://wiki.charltonslaw.com/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js|https://wiki.proxy.domain/xwiki/webjars/wiki%3Ait/mxgraph-editor/3.7.2/mxGraphEditor.min.js]'
> because its MIME type ('text/x-matlab') is not executable, and strict MIME
> type checking is enabled.
> {quote}
> My "solution" is to disable the strict mime type checking in the ssl proxy,
> but I don't think that is idea. It'd be better of the matlab parser didn't
> claim random minified js files as its own.
>
> Note:
> Edit: I marked the problem as being with the matlab parser, but that may be
> incorrect - I'm not sure exactly what code actually does the detection.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)