Hi, JCR-1878 is now resolved and Jackrabbit trunk is depending on Apache Tika for text extraction functionality. Thus there is little more need for jackrabbit-text-extractors as a standalone component. Anyone who needs that functionality separately from jackrabbit-core should just go for Tika directly.
For backwards compatibility with existing configurations (and potential extensions) we still need the current org.apache.jackrabbit.extractor classes, but I'm thinking of simply moving the entire package to jackrabbit-core and deprecating everything except the new Tika-based extractor. In fact I'd even go as far as changing the indexing code in jackrabbit-core to use the Tika Parser interface directly and only provide a backwards-compatibility layer for the TextExtractor classes we have. Thus Jackrabbit 1.6 would no longer contain a separate text-extractors jar, but all the existing TextExtractor classes would still be incluced. In Jackrabbit 2.0 we'd drop all the TextExtractors and only use Tika Parsers. BR, Jukka Zitting