>>If we actually moved the source files out of
tika-parsers and into the modules this would not be required.
Of course, got it...now. Thank you. In Tika 2.0, would we want to actually
move the source code for the bundles into the bundles, I wonder. Or, would we
still want to have one massive tika-parsers module as we do now?
>> Can you provide the source for your test?
Source?! Ha. No, I literally/manually copied the jars into bin... the horror!
I'm sure maven would have added the transitive dependencies. I'll give that a
try. Thank you, again.
Best,
Tim
On Mon, Aug 3, 2015 at 1:56 PM, Allison, Timothy B. <[email protected]>
wrote:
> Bob,
> Thank you for leading this effort. Please continue to forgive my lack
> of OSGi knowledge in the following :).
> I just built your strawman, and it looks great...I think(?)
>
> To confirm that I understand correctly...
>
> For non-OSGi folks, let's say I know that I'm only going to be parsing
> images, to replicate the features of tika-app but with only the image
> parsers, would I put these three jars in a bin directory (or add them via
> maven to my project):
>
> original-tika-app-1.10-SNAPSHOT.jar
> tika-image-parser-bundle-1.10-SNAPSHOT.jar
> tika-core-1.10-SNAPSHOT.jar
>
> and then type:
> java -cp "bin/*" org.apache.tika.cli.TikaCLI -then -the -commandline
> -options
>
>
> If this is the goal, why is tika-parsers "provided" in
> tika-image-parser-bundle's pom? Aren't all dependencies of the image
> parsers packaged within that bundle?
>
> When I actually tried this, I got a noclassdeferror for Apache logging.
> Should we add that within the image-parser-bundle? Or should I be doing
> something else (answer= very likely :) ).
>
> Thank you, again.
>
> Best,
>
> Tim
>
>
> -----Original Message-----
> From: Bob Paulin [mailto:[email protected]]
> Sent: Saturday, July 25, 2015 9:07 PM
> To: [email protected]
> Subject: [DISCUSS] A more modular parser project
>
> Hi,
>
> At Apache Con this year there was some discussion of some woes with OSGi
> and Tika.
>
> The TL;DR version of this email is I think the pain with tika-bundle is
> due to it's attempt to encompass all the parsers in one Uber jar. I
> think breaking down the tika-bundle into smaller jars (like Apache
> Camel) might provide a more maintainable approach going forward for OSGi
> and non-OSGi developers alike.
>
> I've put up a straw man implementation of smaller bundles here:
>
> https://github.com/bobpaulin/tika
> (specifically in
> https://github.com/bobpaulin/tika/tree/trunk/tika-parser-bundles)
>
> For the impatient please check out the code above. The approach has
> zero impact on any of the current tika jar files so nothing would have
> to change for current users but they could chose to move to the new
> approach if it better fit their needs. It's a straw man so feel free to
> beat it up :).
>
> For the rest that have time for an explanation....
> The straw man implementation does the following:
>
> 1) Copies and inlines the tika-core.jar in the tika-osgi-bundle. (see
> pom.xml)
>
> 2) The tika-osgi-bundle registers a Tika Service that contains a Default
> Parser that is composed of all Tika Parsers registered as OSGi services
> as well as available Detectors. This extends the functionality in the
> tika-core TikaActivator class.
>
> 3) Copies and inlines classes from the tika-parser project into specific
> bundles (see individual pom.xml specifically in the maven-bundle-plugin).
>
> 4) Each bundle registers services for each parser and provides
> configuration for ranking (see individual Activator.java). This could
> expand considerably if we wanted to provide additional properties to
> toggle features on and off.
>
> This provides the following advantages to users and developers in both
> an OSGi and standard deployment:
>
> 1) These bundles could be evolved separately. The community could drive
> how finely grained the bundles were. For example I bundled the image
> and jpeg packages together but put tesseract ocr separate.
>
> 2) Users can just select the parsers they want to include in there
> projects.
>
> 3) Each parser project could maintain it's own optional and embedded
> bundle dependencies. Currently the tika-bundle project has a marathon
> of these entries with optional and embedded dependencies clocking in at
> ~200 line and ~20 lines respectively.
>
> If you made it this far thanks for reading :). As an OSGi developer I
> think the implementation provides a better experience but could also
> make things better for non-OSGi developers as well. And best of all it
> works without requiring changes to any of the existing released jars.
> Let me know what you think.
>
>
> - Bob Paulin
>