Re: Impact and WAND

Adrien Grand Wed, 10 Jul 2019 23:58:01 -0700

Block-max WAND and other optimizations that improve the retrieval of
top hits (block-max WAND is about disjunctions, but we have
optimizations for conjunctions, phrases and boolean queries that mix
MUST and SHOULD clauses too) are only applied when the score mode is
TOP_SCORES indeed.


The level in MaxScoreCache is indeed a skip level. Impacts are stored
alongside skip data.

On Thu, Jul 11, 2019 at 5:21 AM Wu,Yunfeng <[email protected]> wrote:
>
>
> @Adrien Grand <[email protected]<mailto:[email protected]>>. Thanks for your 
> reply.
>
> The explanation ` skip low-scoring matches` is great,  I  looked up some docs 
> and inspect some related code.
>
> I noticed the ` block-max WAND` mode only work when  ScoreMode.TOP_SCORES is 
> used,   is right?  (The basic TermQuery would generate ImpactDISI with 
> scoreMode is TOP_SCORES.)
>
> Lucene compute max score per block and then cached in `MaxScoreCache` , this 
> means we can skip low-scoring block( current one block 128 DocIds)  and in 
> competitive block  still need to score any docId as seen,   I confused with  
> `MaxScoreCache#getMaxScoreForLevel(int level)`, what the level mean? Skip 
> level?  (Somewhere invoke this method pass one Integer upTo param)
>
> Thanks Lucene Team
>
>
> 在 2019年7月10日，下午10:52，Adrien Grand 
> <[email protected]<mailto:[email protected]>> 写道：
>
> To clarify, the scoring process is not accelerated because we
> terminate early but because we can skip low-scoring matches (there
> might be competitive hits at the very end of the index).
>
> CompetitiveImpactAccumulator is indeed related to WAND. It helps store
> the maximum score impacts per block of documents in postings lists.
> Then this information is leveraged by block-max WAND in order to skip
> low-scoring blocks.
>
> This does indeed help avoid reading norms, but also document IDs and
> term frequencies.
>
> On Wed, Jul 10, 2019 at 4:10 PM Wu,Yunfeng 
> <[email protected]<mailto:[email protected]>> wrote:
>
> Hi,
>
> We discuss some topic from https://github.com/apache/lucene-solr/pull/595. As 
> Atri Sharma propose discuss with the java dev list.
>
>
> Impact `frequency ` and `norm ` just to accelerate the `score process`  which 
>  `terminate early`.
>
> In impact mode, `CompetitiveImpactAccumulator` will record (freq, norm) pair 
> , would stored at index level. Also I noted `CompetitiveImpactAccumulator` 
> commented with `This class accumulates the (freq, norm) pairs that may 
> produce competitive scores`,  maybe related to `WAND`?
>
>
> The norm value which produced or consumed by `Lucene80NormsFormat`.
>
> In this ` Impact way`, we can avoid read norms from `Lucene80NormsProducer` 
> that may generate the extra IO?  （ the norm value Lucene stored twice.）and 
> take full advantage of the WAND method?
>
>
>
> --
> Adrien
>


-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Impact and WAND

Reply via email to