Is there a way to programmatically register new Mime Types? We have a way
to plug-in new parsers, but I do not see a way to define new file types.
I'd like to be able to contribute both the Mime Type definitions as well as
the Parser implementations that parse them in a single plugin Jar file. The
code to update Mime Types exists in org.apache.tika.mime.MimeTypesReader but
that class is package scope. I would like it to be public, or provide
another class like the one attached that exposes its functionality. The key
is that I want to keep the standard Mime Types and just append or override a
few of my own. I currently append to the Mime Types using:
MimeTypes types = _tikaConfig.getMimeRepository();
MimeTypesAppender appender = new MimeTypesAppender(types);
appender.append(mimeDoc);
I realize that I can copy the tika-mimetypes.xml file and add my own types,
but it requires that I maintain one master file, and that I update it every
time someone on my team adds or removes a new parser. I then run the risk of
getting out of sync with the one distributed with Tika. I think a better
approach might be to add another META-INF/ file that contains the extra mime
types that should be loaded by Tika.
org.apache.tika.config.ServiceLoader.findServiceResources hints at this
approach but it doesn't appear to be in place. MimeTypes
getDefaultMimeTypes() just loads a single file.
-Tom
package org.apache.tika.mime;
import java.io.IOException;
import java.io.InputStream;
import org.w3c.dom.Document;
/**
* Works around the fact that the MimeTypesReader class is package scope.
*
*/
public class MimeTypesAppender {
private final MimeTypesReader _reader;
public MimeTypesAppender(MimeTypes types) {
this._reader = new MimeTypesReader(types);
}
public void append(Document doc) throws MimeTypeException {
_reader.read(doc);
}
public void append(InputStream is) throws MimeTypeException, IOException
{
_reader.read(is);
}
}