[
https://issues.apache.org/jira/browse/TIKA-4612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050855#comment-18050855
]
Tim Allison commented on TIKA-4612:
-----------------------------------
Rather than whack-a-mole with priority, I chatted with claude for
recommendations. I've attached the discussion. Let's go with the "both"
approach. I'll add the attached file as a unit test file.
> Some mp3 files are detected as audio/x-aac instead of audio/mpeg
> ----------------------------------------------------------------
>
> Key: TIKA-4612
> URL: https://issues.apache.org/jira/browse/TIKA-4612
> Project: Tika
> Issue Type: Bug
> Affects Versions: 2.9.0, 3.2.3
> Reporter: V. S.
> Assignee: Tim Allison
> Priority: Major
> Attachments: mp3-v-aac-claude.txt, test.mp3
>
>
> When reading the attached test.mp3 file into Tika.detect, _all versions since
> Tika 2.9.0_ incorrectly report "audio/x-aac" instead of "audio/mpeg". Tika
> 2.8.0 reports "audio/mpeg" correctly.
> I believe this might be due to the priority setting here, but I am not fully
> aware how this works:
> [https://github.com/apache/tika/blob/main/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml#L6166|https://github.com/apache/tika/blob/3.2.3/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml#L6166]
> Note that I can only supply the first 1024 bytes of the MP3 file due to legal
> reasons. However, this seems to be enough for the detection logic.
> This error has occured with about 30% of the MP3 files we were processing.
>
> Other tools correctly report MP3, e.g.
> {{$ file test.mp3 }}
> {{test.mp3: Audio file with ID3 version 2.3.0, contains:\012- MPEG ADTS,
> layer III, v2, 64 kbps, 16 kHz, JntStereo}}
>
> Minimal test program:
> {{{}package com.example;{}}}{{{}import org.apache.tika.Tika;{}}}
> {{import java.io.FileInputStream;}}
> {{{}import java.io.IOException;{}}}{{{}public class TikaTest {{}}}{{ public
> static void main(String args[]) {}}
> {{ Tika tika = new Tika();}}
> {{ }}
> {{ try (FileInputStream fis = new FileInputStream("test.mp3")) {}}
> {{ System.out.println(tika.detect(fis));}}
> {{ } catch (IOException e) { }}
> {{ e.printStackTrace(); }}
> {{ }}}
> {{ }}}
> {{}}}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)