Re: Boolean Scorer

Adrien Grand Mon, 21 Jun 2021 06:34:39 -0700

It should be possible to make something like this work. The main issue is
that Lucene has the expectation that a (Bulk)Scorer is consumed in the
thread where it was pulled, so this would require substantial changes to
how BooleanScorer currently operates I believe.


I'd be curious to know why you are looking into this rather than passing an
Executor to IndexSearcher so that it can search segments concurrently. Is
it not providing enough concurrency for you?

On Mon, Jun 21, 2021 at 9:46 AM Arihant Samar <[email protected]> wrote:

> Hi,
> There is a function "ScoreWindowIntoBitSetAndReplay" in
> "BooleanScorer.java" which runs over all the scorers.
> I was wondering if we can use multi-threading here with numScorers
> threads. Anyways we are using a special OrCollector here which updates the
> matching array and the score in the buckets of 2048 docs. So we can use a
> Reentrant lock for synchronization in the collector.
>
> I just wanted reviews on this since I tried this and some tests were not
> passing. So if you could tell what is wrong in this approach, I
> would appreciate it.
>
> Thanking You in advance,
> Arihant.
>
> On Tue, 15 Jun 2021, 19:05 Adrien Grand, <[email protected]> wrote:
>
>> Glad it helped. :)
>>
>> On Tue, Jun 15, 2021 at 3:28 PM Greg Miller <[email protected]> wrote:
>>
>>> Thanks for this explanation Adrien! I'd been wondering about this a bit
>>> myself since seeing that DrillSideways also implements a TAAT approach (in
>>> addition to a doc-at-a-time approach). This really helps clear that up.
>>> Appreciate you taking the time to explain!
>>>
>>> Cheers,
>>> -Greg
>>>
>>> On Mon, Jun 14, 2021 at 2:35 AM Adrien Grand <[email protected]> wrote:
>>>
>>>> Hello Arihant,
>>>>
>>>> The Scorer for disjunctions uses a heap data structure that needs to be
>>>> reordered upon every hit. While reordering heaps is efficient as it runs in
>>>> logarithmic time, the fact that it needs to run on every document might add
>>>> non-negligible overhead. BooleanScorer tries to work around this overhead
>>>> by scoring large windows of documents in a more TAAT (term-at-a-time)
>>>> fashion so that Lucene only needs to reorder the heap every 2048 doc IDs
>>>> (the hardcoded window size).
>>>>
>>>> This paper gives a bit more context:
>>>> http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf,
>>>> see section 4 in particular.
>>>>
>>>> On Sat, Jun 12, 2021 at 5:47 PM Arihant Samar <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi ,
>>>>>
>>>>> I am new here . I would like to know what is the exact optimisation
>>>>> carried out in “Boolean Scorer.java” code which led to a separate class 
>>>>> for
>>>>> resolving Boolean Queries in bulk documents. I could not find any material
>>>>> in the documentation for this as well, hence I decided to ask here.
>>>>>
>>>>>
>>>>> Thanking you in advance,
>>>>>
>>>>> Arihant.
>>>>>
>>>>>
>>>>>
>>>>> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
>>>>> Windows 10
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Adrien
>>>>
>>>
>>
>> --
>> Adrien
>>
>

-- 
Adrien

Re: Boolean Scorer

Reply via email to