Hi,
There is a function "ScoreWindowIntoBitSetAndReplay" in
"BooleanScorer.java" which runs over all the scorers.
I was wondering if we can use multi-threading here with numScorers threads.
Anyways we are using a special OrCollector here which updates the matching
array and the score in the buckets of 2048 docs. So we can use a Reentrant
lock for synchronization in the collector.

I just wanted reviews on this since I tried this and some tests were not
passing. So if you could tell what is wrong in this approach, I
would appreciate it.

Thanking You in advance,
Arihant.

On Tue, 15 Jun 2021, 19:05 Adrien Grand, <[email protected]> wrote:

> Glad it helped. :)
>
> On Tue, Jun 15, 2021 at 3:28 PM Greg Miller <[email protected]> wrote:
>
>> Thanks for this explanation Adrien! I'd been wondering about this a bit
>> myself since seeing that DrillSideways also implements a TAAT approach (in
>> addition to a doc-at-a-time approach). This really helps clear that up.
>> Appreciate you taking the time to explain!
>>
>> Cheers,
>> -Greg
>>
>> On Mon, Jun 14, 2021 at 2:35 AM Adrien Grand <[email protected]> wrote:
>>
>>> Hello Arihant,
>>>
>>> The Scorer for disjunctions uses a heap data structure that needs to be
>>> reordered upon every hit. While reordering heaps is efficient as it runs in
>>> logarithmic time, the fact that it needs to run on every document might add
>>> non-negligible overhead. BooleanScorer tries to work around this overhead
>>> by scoring large windows of documents in a more TAAT (term-at-a-time)
>>> fashion so that Lucene only needs to reorder the heap every 2048 doc IDs
>>> (the hardcoded window size).
>>>
>>> This paper gives a bit more context:
>>> http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf,
>>> see section 4 in particular.
>>>
>>> On Sat, Jun 12, 2021 at 5:47 PM Arihant Samar <[email protected]>
>>> wrote:
>>>
>>>> Hi ,
>>>>
>>>> I am new here . I would like to know what is the exact optimisation
>>>> carried out in “Boolean Scorer.java” code which led to a separate class for
>>>> resolving Boolean Queries in bulk documents. I could not find any material
>>>> in the documentation for this as well, hence I decided to ask here.
>>>>
>>>>
>>>> Thanking you in advance,
>>>>
>>>> Arihant.
>>>>
>>>>
>>>>
>>>> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
>>>> Windows 10
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Adrien
>>>
>>
>
> --
> Adrien
>

Reply via email to