Hi Bob,

I'd say decomposition into smaller bundles is the way to go. In my
experience, OSGi bundles with too many dependencies are fragile and hard to
maintain. In the worst case, a regression in a maven-bundle-plugin
configuration would break a parser bundle instead of breaking all of them
in the uber-jar.

Static linking of dependencies should be fine, however  it can  increase
the total size of the Tika distro because different parser bundles may
embed the same transitive dependencies like Apache-Commons, etc.  The huge
pros is that static linking will make the bundles self-contained.
The alternative is to make dependencies optional, but in this case clients
will have to solve the puzzle of adding them into their OSGi containers.
It's doable, but will kill acceptance.


 Regards,
 Yegor

On Thu, Aug 27, 2020 at 5:24 AM Bob Paulin <[email protected]> wrote:

> Hi,
>
> I wanted to discuss OSGi support in Tika 2.0.  My current thought is to
> start with the minimum support which is to add bundle packaging to each of
> the modules [1].  This will make the bundles usable is OSGi but will leave
> users on there own for putting the right dependencies together for usage.
> From there we either stop or we can choose from a few different options:
> 1) Tika Bundle
>
>  This is an all encompassing uber jar with all the parsers and
> dependencies we can legally get away with shipping with an Apache license.
>
> Pros
>
> Low bar to entry for novice OSGi users
>
> Already exists in Tika 1.x
>
> Cons
>
> Difficult to maintain (very complicated maven-bundle-plugin config).  This
> has broken in several releases leaving it unusable.
>
>
> 2) Tika module convenience bundles
>
> This was part of the early 2.0 POC branch where each module had it's own
> tika-bundle with just it's dependencies statically included.
>
> Pros
>
> Less sophisticated maven-bundle-plugin configuration
>
> Low bar for novice OSGi users
>
> Cons
>
> More sub-modules to maintain.
>
>
> There are of course other options but I think it's important to decide if
> either, neither, or both of these options should be considered for the
> initial 2.0 release.
>
>
> - Bob
>
>
> [1]  https://github.com/apache/tika/pull/344
>
>
>

Reply via email to