[
https://issues.apache.org/jira/browse/TIKA-4612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
V. S. updated TIKA-4612:
------------------------
Description:
When reading the attached test.mp3 file into Tika.detect, _all versions since
Tika 2.9.0_ incorrectly report "audio/x-aac" instead of "audio/mpeg". Tika
2.8.0 reports "audio/mpeg" correctly.
I believe this might be due to the priority setting here, but I am not fully
aware how this works:
[https://github.com/apache/tika/blob/main/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml#L6166|https://github.com/apache/tika/blob/3.2.3/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml#L6166]
Note that I can only supply the first 1024 bytes of the MP3 file due to legal
reasons. However, this seems to be enough for the detection logic.
This error has occured with about 30% of the MP3 files we were processing.
Other tools correctly report MP3, e.g.
{{$ file test.mp3 }}
{{test.mp3: Audio file with ID3 version 2.3.0, contains:\012- MPEG ADTS, layer
III, v2, 64 kbps, 16 kHz, JntStereo}}
Minimal test program:
{{{}package com.example;{}}}{{{}import org.apache.tika.Tika;{}}}
{{import java.io.FileInputStream;}}
{{{}import java.io.IOException;{}}}{{{}public class TikaTest {{}}}{{ public
static void main(String args[]) {}}
{{ Tika tika = new Tika();}}
{{ }}
{{ try (FileInputStream fis = new FileInputStream("test.mp3")) {}}
{{ System.out.println(tika.detect(fis));}}
{{ } catch (IOException e) { }}
{{ e.printStackTrace(); }}
{{ }}}
{{ }}}
{{}}}
was:
When reading the attached test.mp3 file into Tika.detect, _all versions since
Tika 2.9.0_ incorrectly report "audio/x-aac" instead of "audio/mpeg". Tika
2.8.0 reports "audio/mpeg" correctly.
I believe this might be due to the priority setting here, but I am not fully
aware how this works:
[https://github.com/apache/tika/blob/main/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml#L6166|https://github.com/apache/tika/blob/3.2.3/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml#L6166]
Note that I can only supply the first 1024 bytes of the MP3 file due to legal
reasons. However, this seems to be enough for the detection logic.
This error has occured with about 30% of the MP3 files we were processing.
Other tools correctly report MP3, e.g.
{{$ file test.mp3 }}
{{test.mp3: Audio file with ID3 version 2.3.0, contains:\012- MPEG ADTS, layer
III, v2, 64 kbps, 16 kHz, JntStereo}}
Minimal test program:
{{package com.example;}}
{{import org.apache.tika.Tika;}}
{{import java.io.FileInputStream;}}
{{import java.io.IOException;}}
{{public class TikaTest {}}
{{ public static void main(String args[]) {}}
{{ Tika tika = new Tika();}}
{{ }}
{{ try (FileInputStream fis = new FileInputStream("test.mp3")) {}}
{{ System.out.println(tika.detect(fis));}}
{{ } catch (IOException e) \{ e.printStackTrace(); }}}
{{ }}}
}
> Some mp3 files are detected as audio/x-aac instead of audio/mpeg
> ----------------------------------------------------------------
>
> Key: TIKA-4612
> URL: https://issues.apache.org/jira/browse/TIKA-4612
> Project: Tika
> Issue Type: Bug
> Affects Versions: 2.9.0, 3.2.3
> Reporter: V. S.
> Priority: Major
> Attachments: test.mp3
>
>
> When reading the attached test.mp3 file into Tika.detect, _all versions since
> Tika 2.9.0_ incorrectly report "audio/x-aac" instead of "audio/mpeg". Tika
> 2.8.0 reports "audio/mpeg" correctly.
> I believe this might be due to the priority setting here, but I am not fully
> aware how this works:
> [https://github.com/apache/tika/blob/main/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml#L6166|https://github.com/apache/tika/blob/3.2.3/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml#L6166]
> Note that I can only supply the first 1024 bytes of the MP3 file due to legal
> reasons. However, this seems to be enough for the detection logic.
> This error has occured with about 30% of the MP3 files we were processing.
>
> Other tools correctly report MP3, e.g.
> {{$ file test.mp3 }}
> {{test.mp3: Audio file with ID3 version 2.3.0, contains:\012- MPEG ADTS,
> layer III, v2, 64 kbps, 16 kHz, JntStereo}}
>
> Minimal test program:
> {{{}package com.example;{}}}{{{}import org.apache.tika.Tika;{}}}
> {{import java.io.FileInputStream;}}
> {{{}import java.io.IOException;{}}}{{{}public class TikaTest {{}}}{{ public
> static void main(String args[]) {}}
> {{ Tika tika = new Tika();}}
> {{ }}
> {{ try (FileInputStream fis = new FileInputStream("test.mp3")) {}}
> {{ System.out.println(tika.detect(fis));}}
> {{ } catch (IOException e) { }}
> {{ e.printStackTrace(); }}
> {{ }}}
> {{ }}}
> {{}}}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)