RE: improving the scalability in searching part 2

Ard Schrijvers Tue, 14 Aug 2007 07:29:57 -0700

> On 8/8/07, Ard Schrijvers <[EMAIL PROTECTED]> wrote:
> > ...2) The XPath jcr:like implementation, for example : 
> //*[jcr:like(@mytext,'%foo bar qu%')]
> > ...the current jcr:like results in queries taking up to 10 
> seconds to complete for only
> > 1000 nodes with one property, "mytext" which is on average 
> 500 words long....
>

Bertrand Delacretaz wrote:

> Just curious, is
> 
>   %foo bar qu%
> 
> much slower than
> 
>   foo bar qu%
> 
> ?
> 
> I'd guess so, as Lucene-based indexes are usually inefficient with
> leading wildcards. Do your tests confirm that?

Yes they do. A leading wildcard is incredibly slow for text bodies. Using 
trailing wildcards only, seems to be fast enough, though probably scale 
linearly with the number of documents since it is probably done with a 
startswith on a lucene field value. For a leading wildcard, I think some sort 
of 2 step filter might work, where the first term is expanded to all possible 
terms that end with that term, then seek for documents in the full text that 
match, and then do the current filter over this filtered set. WDOT? 

The org.apache.lucene.misc.ChainedFilter seems suitable for the job, though I 
haven't worked with it yet. 

Regards Ard 

> 
> -Bertrand
>

RE: improving the scalability in searching part 2

Reply via email to