[
https://issues.apache.org/jira/browse/TIKA-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346415#comment-14346415
]
Nick Burch commented on TIKA-1039:
----------------------------------
Without writing a dedicated detector, I'm not sure how else we could do this.
The logic for looking for id3 frame header OR audio frame header + suitable gap
+ audio frame header of the same kind of type is beyond what the Tika mime
magic match rules support. We can't write magic rules for raw image files, as
they have no header and could start+contain anything
Unless someone else can think up a cunning way round this, I fear it may have
to be a Won't Fix
> Raw image file detected as audio/mpeg
> -------------------------------------
>
> Key: TIKA-1039
> URL: https://issues.apache.org/jira/browse/TIKA-1039
> Project: Tika
> Issue Type: Bug
> Components: mime
> Affects Versions: 1.2
> Reporter: Oliver Boldt
> Attachments: SimpleTestFile.raw
>
>
> A raw image file that starts with a long sequence of FFFF.... is recognised
> as audio/mpeg.
> The problem is that the raw file does not have a magic number itself and the
> FF...-pixeldata is wrongly interpreted as an mpeg file. The bug seems to be a
> general problem, because other image data could be misinterpreted as other
> magic numbers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)