[ 
https://issues.apache.org/jira/browse/TIKA-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17400920#comment-17400920
 ] 

Nick Burch commented on TIKA-3528:
----------------------------------

Currently we detect to the video format based on the overall container ASF 
magic, then specialise to the audio format if we find the audio header near the 
start. Unfortunately your video has an audio track near the beginning, so that 
is triggering the match

Based on 
[https://docs.microsoft.com/en-us/windows/win32/wmformat/overview-of-the-asf-format]
 and the attached word document for the file specification, we might be able to 
detect the type better from the Stream Type guid in the Stream Properties 
Object. However, at first glance it seems that the Stream Properties object 
isn't in a fixed place (can't see anything saying what order the header objects 
have to come in) so it might be a bit of a gnarly match...

A parser to grab the basic metadata of ASF container formats wouldn't be too 
bad to write though, based on the details in the spec. Just need to define the 
various structures, which could be a good project to try out Kaitai on?

> WMV file detected as WMA (audio/x-ms-wma)
> -----------------------------------------
>
>                 Key: TIKA-3528
>                 URL: https://issues.apache.org/jira/browse/TIKA-3528
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>            Reporter: Nitish Gupta
>            Priority: Major
>
> Attached file is detected as "audio/x-ms-wma" instead of "video/x-ms-asf".
> Link : 
> [https://drive.google.com/file/d/1yB1_RcMxINHSs2s5AQHG4QrEdGWzJwy6/view?usp=sharing]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to