Re: Boolean Scorer

Adrien Grand Tue, 15 Jun 2021 06:35:19 -0700

Glad it helped. :)

On Tue, Jun 15, 2021 at 3:28 PM Greg Miller <gsmil...@gmail.com> wrote:


> Thanks for this explanation Adrien! I'd been wondering about this a bit
> myself since seeing that DrillSideways also implements a TAAT approach (in
> addition to a doc-at-a-time approach). This really helps clear that up.
> Appreciate you taking the time to explain!
>
> Cheers,
> -Greg
>
> On Mon, Jun 14, 2021 at 2:35 AM Adrien Grand <jpou...@gmail.com> wrote:
>
>> Hello Arihant,
>>
>> The Scorer for disjunctions uses a heap data structure that needs to be
>> reordered upon every hit. While reordering heaps is efficient as it runs in
>> logarithmic time, the fact that it needs to run on every document might add
>> non-negligible overhead. BooleanScorer tries to work around this overhead
>> by scoring large windows of documents in a more TAAT (term-at-a-time)
>> fashion so that Lucene only needs to reorder the heap every 2048 doc IDs
>> (the hardcoded window size).
>>
>> This paper gives a bit more context:
>> http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf,
>> see section 4 in particular.
>>
>> On Sat, Jun 12, 2021 at 5:47 PM Arihant Samar <arisam...@gmail.com>
>> wrote:
>>
>>> Hi ,
>>>
>>> I am new here . I would like to know what is the exact optimisation
>>> carried out in “Boolean Scorer.java” code which led to a separate class for
>>> resolving Boolean Queries in bulk documents. I could not find any material
>>> in the documentation for this as well, hence I decided to ask here.
>>>
>>>
>>> Thanking you in advance,
>>>
>>> Arihant.
>>>
>>>
>>>
>>> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
>>> Windows 10
>>>
>>>
>>>
>>
>>
>> --
>> Adrien
>>
>

-- 
Adrien

Re: Boolean Scorer

Reply via email to