improving the scalability in searching part 2

Ard Schrijvers Wed, 08 Aug 2007 07:48:08 -0700

Problem 2:

2) The XPath jcr:like implementation, for example : //*[jcr:like(@mytext,'%foo 
bar qu%')]


The jcr:like implementation (for sql holds the same) is translated to a 
JackRabbit WildcardQuery which in turn uses a WildcardTermEnum which has a 
"protected boolean termCompare(Term term)" method (though I haven't sorted out 
where the exact bottleneck is).

Now, it boils down that when you search for nodes which have some string in 
some property, this implies scanning UN_TOKENIZED fields in lucene, which is 
IMHO, not the way to do it (though, I haven't yet got *the* solution for the 
wildcard parts. Without the wildcards, obviously a PhraseQuery would do on the 
indexed TOKENIZED property <X:FULL:myproperty> field. 

Anyway, the current jcr:like results in queries taking up to 10 seconds to 
complete for only 1000 nodes with one property, "mytext" which is on average 
500 words long. A cached IndexReader won't be faster in it. 

The jcr:like is I think not useable according the current implementation. 
Perhaps somebody else know how to be able to use the PhraseQuery in a way that 
suits our needs (I already posted to the lucene list if there is some best way 
to implement an 'like' functionality)

Regards Ard

-- 

Hippo
Oosteinde 11
1017WT Amsterdam
The Netherlands
Tel  +31 (0)20 5224466
-------------------------------------------------------------
[EMAIL PROTECTED] / [EMAIL PROTECTED] / http://www.hippo.nl
--------------------------------------------------------------

improving the scalability in searching part 2

Reply via email to