Hi Roger, Thanks for your interest in Tika! In a nutshell, Tika is a content extraction tool. You can extract metadata and text, identify spoken languages, and translate text using internet APIs (for now, we're working on machine translation). We're in the process of releasing version 1.6. Tika In Action is a book written by Chris Mattmann, the lead and co-creator of Tika. You can find more info at [0].
You can use Tika multiple ways: *1. tika-app jar*. Try downloading a release on tika.apache.org and running `java -jar tika-app.jar [some file]`. *2. GUI*. Try running `java -jar tika-app.jar --gui`. A graphical interface will pop up. Then, try dragging a file into the window. *3. Tika server*. Run `java -jar tika-app.jar --server`. Then, try one of the commands from [0] (e.g. `curl -X PUT -d @example.csv http://localhost:9998/meta --header "Content-Type: text/csv"`). *4. Java API*. Check out an example of using Parser.parse() at [2]. Hope that helps! Tyler [0] - http://www.manning.com/mattmann/ [1] - http://wiki.apache.org/tika/TikaJAXRS [2] - https://github.com/tpalsulich/TikaExamples On Wed, Aug 6, 2014 at 11:04 PM, Alex Ott <[email protected]> wrote: > I think, that the "Tika in Action" is still actual... > > > On Wed, Aug 6, 2014 at 11:03 PM, Roger Carter <[email protected]> > wrote: > > > Hi Everyone, > > > > I'm new to the apache scene; I have experience with Matlab and minimal > > experience with Python. This seems like a powerful tool and I'd like to > > learn more. If anyone is willing to provide reccomendations for resources > > or detail their experiences in learning Tika, I would be most grateful. > > > > Thanks, > > Roger > > > > > > -- > With best wishes, Alex Ott > http://alexott.net/ > Twitter: alexott_en (English), alexott (Russian) > Skype: alex.ott >
