Hi,
The tika mime type detection code has improved greatly since I last
looked it a while ago. The root-XML-based detection and
ContainerAwareDetector are things we (Aperture) have wanted to do
ourselves since at least 2007 but never got round to it :)
Unfortunately there are many subtle differences between the mime
definition files which would break existing Aperture applications.
Therefore I'd like to implement a temporary solution that would work in
the interim and allow for gradual migration.
first create a normal MimeTypes
mimeTypes = MimeTypesFactory.create("tika-mimetypes.xml");
then delete some definitions with
mimeTypes.deleteMimeType("application/vnd.ms-outlook")
// in tika this is an msg file
// in aperture this is a pst file - clearly wrong, but...
and then read our definitions file
new MimeTypesReader(mimeTypes).read(inputStreamFromOurFile);
Questions:
0. Does this make sense? Am I missing something?
1. there is no deleteMimeType method. Is it possible to delete a mime
type definition from a MimeTypes instance? I just wanted to ask before
trying to implement it myself.
2. the MimeTypesReader class is not public. Is there any particular
reason for that? The code seems to augment, not replace the definitions
so it seems suitable for our use case, but the reader is not public.
3. It seems that there is a rule that all minor types either begin with
x- or are IANA-approved. Please confirm.
4. It also seems that your mime definition file is not related to the
one at freedesktop.org, I mean, there are no policies like "First submit
to freedesktop, wait until they approve and commit and then update the
tika definitions". Please confirm.
Antoni Myłka
[email protected]