Hi, I was probably the last person working with Tika and mime type detection, I improved the mime type detection a bit and updated Tika to version 1.1. With version 1.1 Tika introduced an osgi library, because of that it looks a bit strange to use in the code with this service dependencie, but in fact it's better than before, we don't have to include Tika to every bundle itself now.
Tika is a really good library for mime type detection, but unfortunately not for audio video at the moment, this is why matterhorn is shipping it's own mime type detection. And yes there are different other mime type detection libraries which can be used. So the best approach may be using several of the libraries together and write it down in a Util class, because at the moment it's not clear where in the code is used which mime type detection. Lukas Rohner Am 29.11.2012 um 10:19 schrieb Rubén Pérez <[email protected]>: > I have just checked THIS PAGE. For what I'm seeing, it seems to me that using > Tika for *only* MimeType detection is like "killing flies with a cannon". The > "real" metadata detection is done with MediaInfo, which is one of the best > metadata detection tools, and its only flaw (for what we want) is it does not > return the MimeTypes. > > If I were you, I would switch to a library with less dependencies and more > specific than Tika. The activation .jar is already used in Matterhorn, and > the "fileNameMap" method seems to be Java's native approach to the issue. On > the other hand, JMimeMagic or mime-util approaches seem lightweight, > efficient and to the point. Of course, I didn't test them myself, it's just > the general impression. > > Anything that can help us to prune the thick bush of dependencies that > Matterhorn already has is a really good thing. But, of course, this is only > my opinion. > > Rubén Pérez > TELTEK Video Research > www.teltek.es > > > > 2012/11/28 James Perrin <[email protected]> > Hi, > > I had a look at the following issue about video mpegs not being correctly > identified. http://opencast.jira.com/browse/MH-8288 > > Though the immediate solution was quite simple it raised some questions > about whether mimetype identification was being done correctly and needs > reviewing. I've no experience in this area so please correct me. > > The MediaInspectionServiceImpl is meant to make use of Apache Tika for > initial inspection of files. I don't know anything about Tika but it seemed > to attempt to get the mimetype in rather an odd way. The extractContentType() > fn gives the input file as a stream to a Tika parser which then returns a > metadata object from which the mimetype is obtained by querying the Content > type of the httpheader in the meta data. OK that may work. > > However in inspectTrack() which calls extractContentType() there is a comment > saying the library doesn't detect audio and video metadata!? Indeed in the > issue I was looking at it returned application/octet-stream. > > The code then defaults to using opencasts own MimeType class which matches > the mimetype by file extension (this is where the original problem was with > the extension associated wih multiple mimetypes). > > This may a way of using Tika but there is a more direct method using Tika > MimeTypes class. It looks that the Tika library should be quite capable of > detecting the mimetype correcty from the file. Could just replace the > opencast mimetype[s] classes altogether? > > Regards > James > > > -- > ------------------------------------------------------------------------ > James S. Perrin > > Media Technologies Team > Devonshire House, University Precinct > The University of Manchester > Oxford Road, Manchester, M13 9PL > > t: +44 (0) 161 275 6945 > e: [email protected] > w: www.manchester.ac.uk/researchcomputing > ------------------------------------------------------------------------ > "The test of intellect is the refusal to belabour the obvious" > - Alfred Bester > ------------------------------------------------------------------------ > _______________________________________________ > Matterhorn mailing list > [email protected] > http://lists.opencastproject.org/mailman/listinfo/matterhorn > > > To unsubscribe please email > [email protected] > _______________________________________________ > > _______________________________________________ > Matterhorn mailing list > [email protected] > http://lists.opencastproject.org/mailman/listinfo/matterhorn > > > To unsubscribe please email > [email protected] > _______________________________________________
_______________________________________________ Matterhorn mailing list [email protected] http://lists.opencastproject.org/mailman/listinfo/matterhorn To unsubscribe please email [email protected] _______________________________________________
