[
https://issues.apache.org/jira/browse/TIKA-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17400920#comment-17400920
]
Nick Burch commented on TIKA-3528:
----------------------------------
Currently we detect to the video format based on the overall container ASF
magic, then specialise to the audio format if we find the audio header near the
start. Unfortunately your video has an audio track near the beginning, so that
is triggering the match
Based on
[https://docs.microsoft.com/en-us/windows/win32/wmformat/overview-of-the-asf-format]
and the attached word document for the file specification, we might be able to
detect the type better from the Stream Type guid in the Stream Properties
Object. However, at first glance it seems that the Stream Properties object
isn't in a fixed place (can't see anything saying what order the header objects
have to come in) so it might be a bit of a gnarly match...
A parser to grab the basic metadata of ASF container formats wouldn't be too
bad to write though, based on the details in the spec. Just need to define the
various structures, which could be a good project to try out Kaitai on?
> WMV file detected as WMA (audio/x-ms-wma)
> -----------------------------------------
>
> Key: TIKA-3528
> URL: https://issues.apache.org/jira/browse/TIKA-3528
> Project: Tika
> Issue Type: Bug
> Components: mime
> Reporter: Nitish Gupta
> Priority: Major
>
> Attached file is detected as "audio/x-ms-wma" instead of "video/x-ms-asf".
> Link :
> [https://drive.google.com/file/d/1yB1_RcMxINHSs2s5AQHG4QrEdGWzJwy6/view?usp=sharing]
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)