Hi,

The tika mime type detection code has improved greatly since I last looked it a while ago. The root-XML-based detection and ContainerAwareDetector are things we (Aperture) have wanted to do ourselves since at least 2007 but never got round to it :)

Unfortunately there are many subtle differences between the mime definition files which would break existing Aperture applications. Therefore I'd like to implement a temporary solution that would work in the interim and allow for gradual migration.

first create a normal MimeTypes
mimeTypes = MimeTypesFactory.create("tika-mimetypes.xml");

then delete some definitions with
mimeTypes.deleteMimeType("application/vnd.ms-outlook")
// in tika this is an msg file
// in aperture this is a pst file - clearly wrong, but...

and then read our definitions file
new MimeTypesReader(mimeTypes).read(inputStreamFromOurFile);

Questions:
0. Does this make sense? Am I missing something?
1. there is no deleteMimeType method. Is it possible to delete a mime type definition from a MimeTypes instance? I just wanted to ask before trying to implement it myself. 2. the MimeTypesReader class is not public. Is there any particular reason for that? The code seems to augment, not replace the definitions so it seems suitable for our use case, but the reader is not public. 3. It seems that there is a rule that all minor types either begin with x- or are IANA-approved. Please confirm. 4. It also seems that your mime definition file is not related to the one at freedesktop.org, I mean, there are no policies like "First submit to freedesktop, wait until they approve and commit and then update the tika definitions". Please confirm.

Antoni Myłka
[email protected]



Reply via email to