I have just checked THIS
PAGE<http://www.rgagnon.com/javadetails/java-0487.html>.
For what I'm seeing, it seems to me that using Tika for *only* MimeType
detection is like "killing flies with a cannon". The "real" metadata
detection is done with MediaInfo, which is one of the best metadata
detection tools, and its only flaw (for what we want) is it does not return
the MimeTypes.

If I were you, I would switch to a library with less dependencies and more
specific than Tika. The activation .jar is already used in Matterhorn, and
the "fileNameMap" method seems to be Java's native approach to the issue.
On the other hand, JMimeMagic or mime-util approaches seem lightweight,
efficient and to the point. Of course, I didn't test them myself, it's just
the general impression.

Anything that can help us to prune the thick bush of dependencies that
Matterhorn already has is a really good thing. But, of course, this is only
my opinion.

Rubén Pérez
TELTEK Video Research
www.teltek.es



2012/11/28 James Perrin <[email protected]>

> Hi,
>
>   I had a look at the following issue about video mpegs not being
> correctly identified. 
> http://opencast.jira.com/**browse/MH-8288<http://opencast.jira.com/browse/MH-8288>
>
>   Though the immediate solution was quite simple it raised some questions
> about whether mimetype identification was being done correctly and needs
> reviewing. I've no experience in this area so please correct me.
>
> The MediaInspectionServiceImpl is meant to make use of Apache Tika for
> initial inspection of files. I don't know anything about Tika but it seemed
> to attempt to get the mimetype in rather an odd way. The
> extractContentType() fn gives the input file as a stream to a Tika parser
> which then returns a metadata object from which the mimetype is obtained by
> querying the Content type of the httpheader in the meta data. OK that may
> work.
>
> However in inspectTrack() which calls extractContentType() there is a
> comment saying the library doesn't detect audio and video metadata!? Indeed
> in the issue I was looking at it returned application/octet-stream.
>
> The code then defaults to using opencasts own MimeType class which matches
> the mimetype by file extension (this is where the original problem was with
> the extension associated wih multiple mimetypes).
>
> This may a way of using Tika but there is a more direct method using Tika
> MimeTypes class. It looks that the Tika library should be quite capable of
> detecting the mimetype correcty from the file. Could just replace the
> opencast mimetype[s] classes altogether?
>
> Regards
> James
>
>
> --
> ------------------------------**------------------------------**
> ------------
>  James S. Perrin
>
>  Media Technologies Team
>  Devonshire House, University Precinct
>  The University of Manchester
>  Oxford Road, Manchester, M13 9PL
>
>  t: +44 (0) 161 275 6945
>  e: [email protected]
>  w: 
> www.manchester.ac.uk/**researchcomputing<http://www.manchester.ac.uk/researchcomputing>
> ------------------------------**------------------------------**
> ------------
> "The test of intellect is the refusal to belabour the obvious"
> - Alfred Bester
> ------------------------------**------------------------------**
> ------------
> ______________________________**_________________
> Matterhorn mailing list
> [email protected]
> http://lists.opencastproject.**org/mailman/listinfo/**matterhorn<http://lists.opencastproject.org/mailman/listinfo/matterhorn>
>
>
> To unsubscribe please email
> matterhorn-unsubscribe@**opencastproject.org<[email protected]>
> ______________________________**_________________
>
_______________________________________________
Matterhorn mailing list
[email protected]
http://lists.opencastproject.org/mailman/listinfo/matterhorn


To unsubscribe please email
[email protected]
_______________________________________________

Reply via email to