> On 8/8/07, Ard Schrijvers <[EMAIL PROTECTED]> wrote: > > ...2) The XPath jcr:like implementation, for example : > //*[jcr:like(@mytext,'%foo bar qu%')] > > ...the current jcr:like results in queries taking up to 10 > seconds to complete for only > > 1000 nodes with one property, "mytext" which is on average > 500 words long.... >
Bertrand Delacretaz wrote: > Just curious, is > > %foo bar qu% > > much slower than > > foo bar qu% > > ? > > I'd guess so, as Lucene-based indexes are usually inefficient with > leading wildcards. Do your tests confirm that? Yes they do. A leading wildcard is incredibly slow for text bodies. Using trailing wildcards only, seems to be fast enough, though probably scale linearly with the number of documents since it is probably done with a startswith on a lucene field value. For a leading wildcard, I think some sort of 2 step filter might work, where the first term is expanded to all possible terms that end with that term, then seek for documents in the full text that match, and then do the current filter over this filtered set. WDOT? The org.apache.lucene.misc.ChainedFilter seems suitable for the job, though I haven't worked with it yet. Regards Ard > > -Bertrand >
