Re: Getting rid of jackrabbit-text-extractors

Jukka Zitting Wed, 08 Apr 2009 05:13:59 -0700

Hi,

On Wed, Apr 8, 2009 at 1:43 PM, Marcel Reutegger
<[email protected]> wrote:
> On Tue, Apr 7, 2009 at 23:29, Jukka Zitting <[email protected]> wrote:
>> Thus Jackrabbit 1.6 would no longer contain a separate text-extractors
>> jar, but all the existing TextExtractor classes would still be
>> incluced. In Jackrabbit 2.0 we'd drop all the TextExtractors and only
>> use Tika Parsers.
>
> hmm, this adds quite some dependencies to jackrabbit-core.


Currently we already have quite a few parsing dependencies through
jackrabbit-text-extractors. Tika has even more, but with TIKA-1878
we're already including them.

There's been some discussion in Tika about splitting Tika into a core
jar with no dependencies (or just a few like commons-io), and a
separate parser jar (or more) that contain the Parser implementations
that depend on the various parser libraries like POI. I could push
that idea forward in Tika if it would be useful in Jackrabbit.

> What if we kept the dependency from jackrabbit-core to
> jackrabbit-jcr-tests at version 1.5 but at the same time flag it
> optional? That would remove it from the dependency tree but you'd
> still have it in the pom (until we remove it in 2.0).

(I assume you mean jackrabbit-text-extractors)

The SearchIndex class currently has a hard dependency to TextExtractor
that needs to be there also on runtime, so we can't make the
text-extractors dependency optional without changing things. I'd
prefer to replace that dependency with one to the Tika Parser
interface, but then we need a hard Maven dependency on Tika.

In either case I think it's best for everyone if the current
TextExtractor classes will remain in the runtime classpath (in either
the text-extractors or the core jar) so that there's no need to modify
existing configurations.

BR,

Jukka Zitting

Re: Getting rid of jackrabbit-text-extractors

Reply via email to