> Ard Schrijvers wrote: > > It is crystal clear: When you have old format, you stay in > that format, if > > you start with new index, you get the new format. Clear and > implementable > > IMO. I can give it a try and implement it unless somebody > else wants to do > > it?
> Marcel Reutegger wrote: > be our guest ;) I am working on https://issues.apache.org/jira/browse/JCR-1064. Implementing the new _:PROPERTIES_SET idea is extremely simple and changing the MatchAllScorer is quite trivial too. Performance gains of factors 10 I get. Not only for the //[EMAIL PROTECTED], but also for //[EMAIL PROTECTED] and @myothertext] //[EMAIL PROTECTED] or @myothertext] //*[not(@mytext)] //[EMAIL PROTECTED]'foo'] and for quite some more (all parts in LuceneQueryBuilder where MatchAllQuery is used) But, while adding these quite trivial changes, I realized that the MatchAllScorer AFAICS becomes superfluous, hence also creating sometimes expensive filters. For example //[EMAIL PROTECTED] and @myothertext] when I have 10^6 nodes with mytext prop takes like ~100ms (>1 sec for the old MatchAllScorer) Not using the MatchAllQuery but just (2 times) query = new TermQuery(new Term(FieldNames.PROPERTIES_SET,field)); results in about 15 ms when for example 10^6 nodes have prop 'mytext' and 10^2 have myothertext. This result scales for many more documents. The current implementation takes > 1 sec at my computer, and the MatchAllQuery is used for many more usecases. Since IMO this is such a performance and scalability improvement I want to discuss the backwards compatability for older jackrabbit releases which have an index which is not suitable for this new approach. Checking the current index at startup and then fallback to old index style if no fieldName FieldNames.PROPERTIES_SET is present seems a little "hacky" to me to implement. What I would like is to enable people to choose between two index types within the searchindex configuration, something like: <param name="index-type" value="old"/> old|new and have this value for all 1.3.x releases set to old, and from the 1.4.0 release, set it to new. People can then use the 1.4.0 version with the old index type. From 1.4.0 we could also mark the "MatchAllQuery", "MatchAllScorer" and "MatchAllWeight" as deprecated AFAICS, but I might be missing something. So, WDOT? I really like to push the changes in the 1.4 version, because for *many* nodes, speedups of more then hundreds of times for certains queries can be seen (some will have factor 10, some factor 2, but all will be faster). Regards Ard > > regards > marcel >
