Tetiana Tvardovska created TIKA-3590: ----------------------------------------
Summary: OSX DMG files wrong MIME type detection (wrong MediaType and Supertype) Key: TIKA-3590 URL: https://issues.apache.org/jira/browse/TIKA-3590 Project: Tika Issue Type: Bug Components: core Affects Versions: 2.1.0, 2.0.0-BETA, 2.0.0-ALPHA, 1.27, 1.26 Reporter: Tetiana Tvardovska Calling {{mimeSupport.detectMimeTypes}} for OSX DMG files returns a wrong value. DMG files are detected as MIME type: {{*"application/zlib"*}} or *{{"application/x-bzip"}}* instead of expected: *{{"application/x-apple-diskimage".}}* Error is caused by {{getSupertype}} method which returns a wrong type (too "super" {{{}MediaType.OCTET_STREAM){}}}for OSX DMG files instead of {{{}*"application/zlib" or* {*}"application/x-bzip"{*}{*}{*}{}}}. For information, DMG mime type is correctly detected when debugging the method {code:java} org/apache/tika/mime/MimeTypes.java:484 public MediaType detect(... 522: MimeType hint = getMimeType(name); {code} the {{hint}} value gets a correct *{{"application/x-apple-diskimage"}}* value here. But later the {{hint}} value is not taken into consideration for {{possibleTypes}} as {{applyHint}} results: {code:java} 529: possibleTypes = applyHint(possibleTypes, hint);{code} This wrong value is returned to : {code:java} repository/org/apache/tika/tika-core/1.26/tika-core-1.26-sources.jar!/org/apache/tika/detect/CompositeDetector.java:84 MediaType detected = detector.detect(input, metadata); if (registry.isSpecializationOf(detected, type)) { type = detected; } {code} h3. Possible solution -Add a more precise Supertype detection for "{{{}*application/x-apple-diskimage*{}}}" type Just add one more verification into the {{{}MediaTypeRegistry.{}}}{{getSupertype}} method, for example, in a 'diff'-like format: {{org/apache/tika/tika-core/1.26/tika-core-1.26-sources.jar}} {{org/apache/tika/mime/MediaTypeRegistry.java:187}} {code:java} public MediaType getSupertype(MediaType type) { ... + } else if (type.getSubtype().endsWith("x-apple-diskimage")) { + return MediaType.application("x-bzip"); + } ... } {code} or {code:java} public MediaType getSupertype(MediaType type) { ... + } else if (type.getSubtype().endsWith("x-apple-diskimage")) { + return MediaType.APPLICATION_ZIP; + } ... } {code} --- Tested at project [Sonatype Nexus|https://github.com/sonatype/nexus-public/] {{release-3.36.0-01 }}for RAW repository with a "Strict Content Type Validation" set ON when trying to upload *.dmg files. -- This message was sent by Atlassian Jira (v8.20.1#820001)