> Ard Schrijvers wrote:
> > It is crystal clear: When you have old format, you stay in 
> that format, if
> > you start with new index, you get the new format. Clear and 
> implementable
> > IMO. I can give it a try and implement it unless somebody 
> else wants to do
> > it?

> Marcel Reutegger wrote:
> be our guest ;)

I am working on https://issues.apache.org/jira/browse/JCR-1064. Implementing 
the new _:PROPERTIES_SET idea is extremely simple and changing the 
MatchAllScorer is quite trivial too. Performance gains of factors 10 I get. Not 
only for the //[EMAIL PROTECTED], but also for

//[EMAIL PROTECTED] and @myothertext]
//[EMAIL PROTECTED] or @myothertext]
//*[not(@mytext)]
//[EMAIL PROTECTED]'foo'] 

and for quite some more (all parts in LuceneQueryBuilder where MatchAllQuery is 
used)

But, while adding these quite trivial changes, I realized that the 
MatchAllScorer AFAICS becomes superfluous, hence also creating sometimes 
expensive filters. For example 

//[EMAIL PROTECTED] and @myothertext] when I have 10^6 nodes with mytext prop 
takes like ~100ms (>1 sec for the old MatchAllScorer)

Not using the MatchAllQuery but just (2 times)

query = new TermQuery(new Term(FieldNames.PROPERTIES_SET,field)); 

results in about 15 ms when for example 10^6 nodes have prop 'mytext' and 10^2 
have myothertext. This result scales for many more documents. The current 
implementation takes > 1 sec at my computer, and the MatchAllQuery is used for 
many more usecases.

Since IMO this is such a performance and scalability improvement I want to 
discuss the backwards compatability for older jackrabbit releases which have an 
index which is not suitable for this new approach. Checking the current index 
at startup and then fallback to old index style if no fieldName 
FieldNames.PROPERTIES_SET is present seems a little "hacky" to me to implement. 
What I would like is to enable people to choose between two index types within 
the searchindex configuration, something like:

<param name="index-type" value="old"/> old|new

and have this value for all 1.3.x releases set to old, and from the 1.4.0 
release, set it to new. People can then use the 1.4.0 version with the old 
index type. From 1.4.0 we could also mark the "MatchAllQuery", "MatchAllScorer" 
and "MatchAllWeight" as deprecated AFAICS, but I might be missing something. 

So, WDOT? I really like to push the changes in the 1.4 version, because for 
*many* nodes, speedups of more then hundreds of times for certains queries can 
be seen (some will have factor 10, some factor 2, but all will be faster). 

Regards Ard

> 
> regards
>   marcel
> 

Reply via email to