Hi,
I'd like to propose a new Tika App for the 2.0 branch. One of the
reasons we broke apart the Tika parsers into modules was due to the
complexity of having to deal with all the parser dependencies and
transitive dependencies. Now developers can use just the modules they
want without pulling the kitchen sink with it. Unfortunately this
approach doesn't simplify the problem in the tika-parser or tika-app
project where the whole kitchen sink comes together again. This is a
difficult problem but I think it's one that the Apache Felix [1] project
has done a good job solving. I've described the approach and provided
an implementation in my github [2] please see the README for details.
I'd like to get a sense from the community if this is a direction we'd
like to go in since it involves bring in another stack. If we want to
move this this direction I'm happy to move it into the tika 2.0 branch.
I think this approach opens the door for some cool features like plugins
and will allow the modules to upgrade more aggressively due to less
pressure to matchup the dependencies.
I've created a JIRA [3]. I'm happy to take feedback there or on this
thread.
- Bob
[1] http://felix.apache.org/
[2] https://github.com/bobpaulin/tika-app-osgi
[3] https://issues.apache.org/jira/browse/TIKA-2076