Hey Nick,
Thanks for the thoughts. Just to clear a few things up. The version of
the app on my github does already include all the parsers as the current
app does. If you build it and run --list-parsers you'll see them
there. As for the desire to quickly test new bits I think much of the
OSGi stuff has been abstracted away. For an example see the example
folder [1]. The only additions are the Activator class (which is
identical for all the current bundles) and the maven-bundle-plugin in
the pom.xml. But don't take my word for it why not give it a spin?
As for the use cases I would say consider whenever we upgrade or add
parsers/detectors/encodingdetectors/languagedetectors we .may introduce
new dependencies or new versions. For example the pom for the tika-app
currently pulls in 3 different versions of commons-io, 2 versions of
commons-codec, 2 versions of Guava. Maven resolves to just one version
in the final build but the effect is that every part of the code must
work with the selected version. In the OSGi version of tika-app the
modules can have different versions of the dependencies within the same
app. Also within TIKA-1285 [2] it could have been possible to support 2
different versions of PDFBox within different OSGi bundles. So I see it
as more of a gain but I'd be interesting in hearing if there is any
degradation in the development experience.
- Bob
[1]
https://github.com/bobpaulin/tika-app-osgi/tree/master/examples/dummy-parser-bundle
[2] https://issues.apache.org/jira/browse/TIKA-1285
On 9/13/2016 3:38 PM, Nick Burch wrote:
On Sun, 11 Sep 2016, Bob Paulin wrote:
I'd like to propose a new Tika App for the 2.0 branch. One of the
reasons we broke apart the Tika parsers into modules was due to the
complexity of having to deal with all the parser dependencies and
transitive dependencies. Now developers can use just the modules
they want without pulling the kitchen sink with it. Unfortunately
this approach doesn't simplify the problem in the tika-parser or
tika-app project where the whole kitchen sink comes together again.
One of the nice things about the tika app (and server) is you do get
everything, so it's very easy to test and get started with!
Another nice thing is that you can test small changes (eg a new parser
or a new mime type) quite quickly, just by using the tika app jar on
your classpath along with your customisation. Makes it very easy to
try out new things if you're a new developer, and I find usually
easier than firing up eclipe if I just want to try a new mime type
change for someone.
More modular versions of the Tika server I could certainly get behind,
if we haven't already done so!
For the app, are there that many use cases for it where you might only
want some of Tika? (Most people calling Tika from another language
would likely be better off with the server, to avoid the JVM
start/stop overhead).
Would the new osgi version make it harder for people to test new bits
with tika? For one example, whenever we've done a hackathon and are
helping people with a new parser, helping them get their new parser
used with just the app is about do-able. I fear if we made them also
learn osgi + build a bundle, at that stage when they're trying to do a
"hello world", we'd loose them :/
The github project does look interesting though! I'd hate for us to
get a few shiny new bits, but loose some key bits important for
newbies / quick-win developers in the process though...
Nick