Tetiana Tvardovska created TIKA-3590:
----------------------------------------

             Summary: OSX DMG files wrong MIME type detection (wrong MediaType 
and Supertype)
                 Key: TIKA-3590
                 URL: https://issues.apache.org/jira/browse/TIKA-3590
             Project: Tika
          Issue Type: Bug
          Components: core
    Affects Versions: 2.1.0, 2.0.0-BETA, 2.0.0-ALPHA, 1.27, 1.26
            Reporter: Tetiana Tvardovska


Calling {{mimeSupport.detectMimeTypes}} for  OSX DMG files returns a wrong 
value.

DMG files are detected as MIME type: {{*"application/zlib"*}} or 
*{{"application/x-bzip"}}*

instead of expected: *{{"application/x-apple-diskimage".}}*

 

Error is caused by {{getSupertype}} method which returns a wrong type (too 
"super" {{{}MediaType.OCTET_STREAM){}}}for OSX DMG files instead of  
{{{}*"application/zlib" or* {*}"application/x-bzip"{*}{*}{*}{}}}.

 

For information, DMG mime type is correctly detected when debugging the  method

 
{code:java}
org/apache/tika/mime/MimeTypes.java:484  public MediaType detect(...
522:  MimeType hint = getMimeType(name); 
{code}
  the {{hint}} value gets a correct *{{"application/x-apple-diskimage"}}* value 
here.

But later the {{hint}} value is not taken into consideration for 
{{possibleTypes}}  as {{applyHint}} results:

 
{code:java}
529:  possibleTypes = applyHint(possibleTypes, hint);{code}
 

This wrong value is returned to : 

 
{code:java}
repository/org/apache/tika/tika-core/1.26/tika-core-1.26-sources.jar!/org/apache/tika/detect/CompositeDetector.java:84
MediaType detected = detector.detect(input, metadata);
if (registry.isSpecializationOf(detected, type)) {
type = detected;
}
{code}
 

 
h3. Possible solution -Add a more precise Supertype detection for 
"{{{}*application/x-apple-diskimage*{}}}" type

Just add one more verification into the 
{{{}MediaTypeRegistry.{}}}{{getSupertype}} method, for example, in a 
'diff'-like format:

{{org/apache/tika/tika-core/1.26/tika-core-1.26-sources.jar}}
{{org/apache/tika/mime/MediaTypeRegistry.java:187}}

 
{code:java}
public MediaType getSupertype(MediaType type) {
 ...
+    } else if (type.getSubtype().endsWith("x-apple-diskimage")) { 
+        return    MediaType.application("x-bzip");
+    }
...
}
{code}
 

or
{code:java}
public MediaType getSupertype(MediaType type) {
 ...
+    } else if (type.getSubtype().endsWith("x-apple-diskimage")) { 
+        return MediaType.APPLICATION_ZIP;
+    }
...
}
{code}
 

 

---

Tested at project [Sonatype Nexus|https://github.com/sonatype/nexus-public/] 
{{release-3.36.0-01 }}for RAW repository with a "Strict Content Type 
Validation" set ON when trying to upload *.dmg files.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to