Hi Folks, I can help but think that we REALLY do not need all of the dependencies that Tika brings in. We are NOT trying to parse all of the MediaTypes supported by Tika, we are merely after a subset [0] including (X)HTML, OWL and maybe a few more... Is there any interest in trying to
1. Contribute the 'purifier' Interface [1] and implementation [2] over to Tika 2. Review the mime implementations we maintain [3]. My personal feeling is that we have too much custom logic for accurate MimeType detection in Any23. This could easily be inherited from Tika. Any thoughts folks? Lewis [0] https://github.com/apache/any23/blob/master/mime/src/main/resources/org/apache/any23/mime/mimetypes.xml#L25 [1] https://github.com/apache/any23/tree/master/api/src/main/java/org/apache/any23/mime/purifier [2] https://github.com/apache/any23/tree/master/mime/src/main/java/org/apache/any23/mime/purifier [3] https://github.com/apache/any23/tree/master/mime/src/main/java/org/apache/any23/mime -- http://home.apache.org/~lewismc/ http://people.apache.org/keys/committer/lewismc
