Hi Bob, I'd say decomposition into smaller bundles is the way to go. In my experience, OSGi bundles with too many dependencies are fragile and hard to maintain. In the worst case, a regression in a maven-bundle-plugin configuration would break a parser bundle instead of breaking all of them in the uber-jar.
Static linking of dependencies should be fine, however it can increase the total size of the Tika distro because different parser bundles may embed the same transitive dependencies like Apache-Commons, etc. The huge pros is that static linking will make the bundles self-contained. The alternative is to make dependencies optional, but in this case clients will have to solve the puzzle of adding them into their OSGi containers. It's doable, but will kill acceptance. Regards, Yegor On Thu, Aug 27, 2020 at 5:24 AM Bob Paulin <[email protected]> wrote: > Hi, > > I wanted to discuss OSGi support in Tika 2.0. My current thought is to > start with the minimum support which is to add bundle packaging to each of > the modules [1]. This will make the bundles usable is OSGi but will leave > users on there own for putting the right dependencies together for usage. > From there we either stop or we can choose from a few different options: > 1) Tika Bundle > > This is an all encompassing uber jar with all the parsers and > dependencies we can legally get away with shipping with an Apache license. > > Pros > > Low bar to entry for novice OSGi users > > Already exists in Tika 1.x > > Cons > > Difficult to maintain (very complicated maven-bundle-plugin config). This > has broken in several releases leaving it unusable. > > > 2) Tika module convenience bundles > > This was part of the early 2.0 POC branch where each module had it's own > tika-bundle with just it's dependencies statically included. > > Pros > > Less sophisticated maven-bundle-plugin configuration > > Low bar for novice OSGi users > > Cons > > More sub-modules to maintain. > > > There are of course other options but I think it's important to decide if > either, neither, or both of these options should be considered for the > initial 2.0 release. > > > - Bob > > > [1] https://github.com/apache/tika/pull/344 > > >
