Re: Boolean Scorer

Greg Miller Tue, 15 Jun 2021 06:28:33 -0700

Thanks for this explanation Adrien! I'd been wondering about this a bit
myself since seeing that DrillSideways also implements a TAAT approach (in
addition to a doc-at-a-time approach). This really helps clear that up.
Appreciate you taking the time to explain!


Cheers,
-Greg

On Mon, Jun 14, 2021 at 2:35 AM Adrien Grand <[email protected]> wrote:

> Hello Arihant,
>
> The Scorer for disjunctions uses a heap data structure that needs to be
> reordered upon every hit. While reordering heaps is efficient as it runs in
> logarithmic time, the fact that it needs to run on every document might add
> non-negligible overhead. BooleanScorer tries to work around this overhead
> by scoring large windows of documents in a more TAAT (term-at-a-time)
> fashion so that Lucene only needs to reorder the heap every 2048 doc IDs
> (the hardcoded window size).
>
> This paper gives a bit more context:
> http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf,
> see section 4 in particular.
>
> On Sat, Jun 12, 2021 at 5:47 PM Arihant Samar <[email protected]> wrote:
>
>> Hi ,
>>
>> I am new here . I would like to know what is the exact optimisation
>> carried out in “Boolean Scorer.java” code which led to a separate class for
>> resolving Boolean Queries in bulk documents. I could not find any material
>> in the documentation for this as well, hence I decided to ask here.
>>
>>
>> Thanking you in advance,
>>
>> Arihant.
>>
>>
>>
>> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
>> Windows 10
>>
>>
>>
>
>
> --
> Adrien
>

Re: Boolean Scorer

Reply via email to