Hi,

I'd like to propose an Apache Tika[1] connector for Apache Camel.  I see
Camel uses a number of Tika components like PDFBox but it could be
interesting to have a full assortment of file parsers to convert files
to text.

The basic configuration would allow MIME type detection and parsing
files to text. 

tika:detect

File/Inputstream -> camel-tika -> MIME Type

tika:parse

File/Inputstream ->  camel-tika -> OutputStream in text

I have a basic implementation that I'd be happy to send in a PR but I
wanted to see if this was something the community was interested in.  I
think it might be interesting to combine a project that integrates
everything with the project the parses everything.  I also think having
a camel-tika component might help achieve some of Tika's 2.0 goals.


- Bob Paulin


[1] https://tika.apache.org/

[2] https://wiki.apache.org/tika/Tika2_0RoadMap


Reply via email to