Hi, On Wed, Apr 8, 2009 at 1:43 PM, Marcel Reutegger <marcel.reuteg...@gmx.net> wrote: > On Tue, Apr 7, 2009 at 23:29, Jukka Zitting <jukka.zitt...@gmail.com> wrote: >> Thus Jackrabbit 1.6 would no longer contain a separate text-extractors >> jar, but all the existing TextExtractor classes would still be >> incluced. In Jackrabbit 2.0 we'd drop all the TextExtractors and only >> use Tika Parsers. > > hmm, this adds quite some dependencies to jackrabbit-core.
Currently we already have quite a few parsing dependencies through jackrabbit-text-extractors. Tika has even more, but with TIKA-1878 we're already including them. There's been some discussion in Tika about splitting Tika into a core jar with no dependencies (or just a few like commons-io), and a separate parser jar (or more) that contain the Parser implementations that depend on the various parser libraries like POI. I could push that idea forward in Tika if it would be useful in Jackrabbit. > What if we kept the dependency from jackrabbit-core to > jackrabbit-jcr-tests at version 1.5 but at the same time flag it > optional? That would remove it from the dependency tree but you'd > still have it in the pom (until we remove it in 2.0). (I assume you mean jackrabbit-text-extractors) The SearchIndex class currently has a hard dependency to TextExtractor that needs to be there also on runtime, so we can't make the text-extractors dependency optional without changing things. I'd prefer to replace that dependency with one to the Tika Parser interface, but then we need a hard Maven dependency on Tika. In either case I think it's best for everyone if the current TextExtractor classes will remain in the runtime classpath (in either the text-extractors or the core jar) so that there's no need to modify existing configurations. BR, Jukka Zitting