Hi, Indeed BMW is only about disjunctions but the paper ( http://engineering.nyu.edu/~suel/papers/bmw.pdf) shortly describes how block max indexes can be used to speed up conjunctions as well using a simple algorithm that they call Block Max And, which we implemented in the BlockMaxConjunctionScorer class.
Le ven. 16 avr. 2021 à 18:51, Tomás Fernández Löbbe <tomasflo...@gmail.com> a écrit : > I was looking at the nightly benchmarks[1] and noticed a big jump in > performance for conjunction queries when LUCENE-8060 was merged. I was > puzzled because I didn't expect BMW to help in this type of queries, but I > guess that's the "other optimizations" you were talking about? Do you have > any pointers to those? > > > [1] https://home.apache.org/~mikemccand/lucenebench > > On Thu, Jul 11, 2019 at 6:02 AM Atri Sharma <a...@linux.com> wrote: > > > Note that any other scoring mode (COMPLETE or COMPLETE_NO_SCORES) will > > mandatorily visit all hits, so there is no scope of skipping and hence > > no point of using impacts. > > > > On Thu, Jul 11, 2019 at 8:51 AM Wu,Yunfeng <wuyunfen...@baidu.com> > wrote: > > > > > > > > > @Adrien Grand <jpou...@gmail.com<mailto:jpou...@gmail.com>>. Thanks > for > > your reply. > > > > > > The explanation ` skip low-scoring matches` is great, I looked up > some > > docs and inspect some related code. > > > > > > I noticed the ` block-max WAND` mode only work when > > ScoreMode.TOP_SCORES is used, is right? (The basic TermQuery would > > generate ImpactDISI with scoreMode is TOP_SCORES.) > > > > > > Lucene compute max score per block and then cached in `MaxScoreCache` , > > this means we can skip low-scoring block( current one block 128 DocIds) > > and in competitive block still need to score any docId as seen, I > > confused with `MaxScoreCache#getMaxScoreForLevel(int level)`, what the > > level mean? Skip level? (Somewhere invoke this method pass one Integer > > upTo param) > > > > > > Thanks Lucene Team > > > > > > > > > 在 2019年7月10日,下午10:52,Adrien Grand <jpou...@gmail.com<mailto: > > jpou...@gmail.com>> 写道: > > > > > > To clarify, the scoring process is not accelerated because we > > > terminate early but because we can skip low-scoring matches (there > > > might be competitive hits at the very end of the index). > > > > > > CompetitiveImpactAccumulator is indeed related to WAND. It helps store > > > the maximum score impacts per block of documents in postings lists. > > > Then this information is leveraged by block-max WAND in order to skip > > > low-scoring blocks. > > > > > > This does indeed help avoid reading norms, but also document IDs and > > > term frequencies. > > > > > > On Wed, Jul 10, 2019 at 4:10 PM Wu,Yunfeng <wuyunfen...@baidu.com > > <mailto:wuyunfen...@baidu.com>> wrote: > > > > > > Hi, > > > > > > We discuss some topic from > > https://github.com/apache/lucene-solr/pull/595. As Atri Sharma propose > > discuss with the java dev list. > > > > > > > > > Impact `frequency ` and `norm ` just to accelerate the `score process` > > which `terminate early`. > > > > > > In impact mode, `CompetitiveImpactAccumulator` will record (freq, norm) > > pair , would stored at index level. Also I noted > > `CompetitiveImpactAccumulator` commented with `This class accumulates the > > (freq, norm) pairs that may produce competitive scores`, maybe related > to > > `WAND`? > > > > > > > > > The norm value which produced or consumed by `Lucene80NormsFormat`. > > > > > > In this ` Impact way`, we can avoid read norms from > > `Lucene80NormsProducer` that may generate the extra IO? ( the norm value > > Lucene stored twice.)and take full advantage of the WAND method? > > > > > > > > > > > > -- > > > Adrien > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > >