I was looking at the nightly benchmarks[1] and noticed a big jump in performance for conjunction queries when LUCENE-8060 was merged. I was puzzled because I didn't expect BMW to help in this type of queries, but I guess that's the "other optimizations" you were talking about? Do you have any pointers to those?
[1] https://home.apache.org/~mikemccand/lucenebench On Thu, Jul 11, 2019 at 6:02 AM Atri Sharma <a...@linux.com> wrote: > Note that any other scoring mode (COMPLETE or COMPLETE_NO_SCORES) will > mandatorily visit all hits, so there is no scope of skipping and hence > no point of using impacts. > > On Thu, Jul 11, 2019 at 8:51 AM Wu,Yunfeng <wuyunfen...@baidu.com> wrote: > > > > > > @Adrien Grand <jpou...@gmail.com<mailto:jpou...@gmail.com>>. Thanks for > your reply. > > > > The explanation ` skip low-scoring matches` is great, I looked up some > docs and inspect some related code. > > > > I noticed the ` block-max WAND` mode only work when > ScoreMode.TOP_SCORES is used, is right? (The basic TermQuery would > generate ImpactDISI with scoreMode is TOP_SCORES.) > > > > Lucene compute max score per block and then cached in `MaxScoreCache` , > this means we can skip low-scoring block( current one block 128 DocIds) > and in competitive block still need to score any docId as seen, I > confused with `MaxScoreCache#getMaxScoreForLevel(int level)`, what the > level mean? Skip level? (Somewhere invoke this method pass one Integer > upTo param) > > > > Thanks Lucene Team > > > > > > 在 2019年7月10日,下午10:52,Adrien Grand <jpou...@gmail.com<mailto: > jpou...@gmail.com>> 写道: > > > > To clarify, the scoring process is not accelerated because we > > terminate early but because we can skip low-scoring matches (there > > might be competitive hits at the very end of the index). > > > > CompetitiveImpactAccumulator is indeed related to WAND. It helps store > > the maximum score impacts per block of documents in postings lists. > > Then this information is leveraged by block-max WAND in order to skip > > low-scoring blocks. > > > > This does indeed help avoid reading norms, but also document IDs and > > term frequencies. > > > > On Wed, Jul 10, 2019 at 4:10 PM Wu,Yunfeng <wuyunfen...@baidu.com > <mailto:wuyunfen...@baidu.com>> wrote: > > > > Hi, > > > > We discuss some topic from > https://github.com/apache/lucene-solr/pull/595. As Atri Sharma propose > discuss with the java dev list. > > > > > > Impact `frequency ` and `norm ` just to accelerate the `score process` > which `terminate early`. > > > > In impact mode, `CompetitiveImpactAccumulator` will record (freq, norm) > pair , would stored at index level. Also I noted > `CompetitiveImpactAccumulator` commented with `This class accumulates the > (freq, norm) pairs that may produce competitive scores`, maybe related to > `WAND`? > > > > > > The norm value which produced or consumed by `Lucene80NormsFormat`. > > > > In this ` Impact way`, we can avoid read norms from > `Lucene80NormsProducer` that may generate the extra IO? ( the norm value > Lucene stored twice.)and take full advantage of the WAND method? > > > > > > > > -- > > Adrien > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >