Hi Folks,
I can help but think that we REALLY do not need all of the dependencies
that Tika brings in.
We are NOT trying to parse all of the MediaTypes supported by Tika, we are
merely after a subset [0] including (X)HTML, OWL and maybe a few more...
Is there any interest in trying to

   1. Contribute the 'purifier' Interface [1] and implementation [2] over
   to Tika
   2. Review the mime implementations we maintain [3]. My personal feeling
   is that we have too much custom logic for accurate MimeType detection in
   Any23. This could easily be inherited from Tika.

Any thoughts folks?
Lewis

[0]
https://github.com/apache/any23/blob/master/mime/src/main/resources/org/apache/any23/mime/mimetypes.xml#L25
[1]
https://github.com/apache/any23/tree/master/api/src/main/java/org/apache/any23/mime/purifier
[2]
https://github.com/apache/any23/tree/master/mime/src/main/java/org/apache/any23/mime/purifier
[3]
https://github.com/apache/any23/tree/master/mime/src/main/java/org/apache/any23/mime
-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc

Reply via email to