Re: full text search improvements

Lukas Kahwe Smith Mon, 26 Mar 2012 06:51:59 -0700

On Mar 26, 2012, at 09:40 , Ard Schrijvers wrote:

> On Sat, Mar 24, 2012 at 3:12 PM, Lukas Kahwe Smith <[email protected]> 
> wrote:
>> Hi,
>> 
>> I am not a Jackrabbit developer but a very interested user and co-lead of 
>> the PHPCR [1] initiative.
>> I wanted to expand partially on what Ard said about potentially looking into 
>> hooking in Solr/ElasticSearch [2] but some other issues I see with full text 
>> search in Jackrabbit 2.x
>> 
>> 1) scaling
>> 
>> Now first up I am overall quite happy with the scalability of Jackrabbit 2.x.
>> Obviously there are two places though where at some point we need to support 
>> sharding and that is the persistence manager (which seems to be covered in 
>> the current Oak plans) and the lucene index (which doesnt seem to covered). 
>> Now imho there are already two perfectly fine projects working on this with 
>> Solr (the more natural choice since its also an Apache project) and 
>> ElasticSearch (imho it provides a much better API).
>> 
>> More over (optionally) leveraging these has several other advantages:
>> - mature products (especially ElasticSearch is very mature when it comes to 
>> sharding), supporting them might also attract new users to Jackrabbit
>> - handle much larger data sets via sharding
>> - provide many more full text search specific features
> 
> What our customers also want, is to be able to query on what a
> document for the end-user (customer) is : Some customers have the
> author of a document being some 'author node' referenced by the
> 'document node' : Now, by the author's name, you do not find the
> document, because the authors name is stored somewhere else.


well you can already do this via a JOIN .. but I guess you are asking to be 
able to do some more denormalization during the indexing process for better 
performance.

(somewhat off topic, but we have this use case in our current application and 
we are concerned that some "meta authors" might lead to too many such 
references .. not sure if addressing this is part of Oak .. so right now we 
"partition" the referrers by date, which is ok but a bit annoying)

>> 2) facetting
>> 
>> Now I mentioned facetting [4] above. Right now Jackrabbit does not even 
>> support COUNT() [5], which I find very painful and a major oversight. But 
>> really what people have come to expect from full text search is facetting. 
>> Imho its so important that it should even be part of JCR 2.1 [6] and as you 
>> can see in this link it seems like HippoCMS developers agree that its a very 
>> useful feature to have inside Jackrabbit.
> 
> Yes, useful, but with hindsight, I wouldn't go for a seamless
> integration any more : We exposed it over virtual layers, but, during
> the past years, performance and memory wise, I've switched my opinion
> that I'd rather opt for not having faceted navigation exposed as
> virtual nodes. Still, being able to query the content over faceted
> navigation is desired by almost all customers


ok interesting.
does your current solution include support for ACLs?

regards,
Lukas Kahwe Smith
[email protected]

Re: full text search improvements

Reply via email to