Re: Boolean Scorer

Arihant Samar Thu, 24 Jun 2021 04:20:18 -0700

I managed to correct some mistakes and now the tests which checks scores
are passing. Obviously the tests which check about the same thread
generating and collecting fail , but just out of interest I removed those
asserts. Are there any tests or benchmarks which I can compare how these
changes perform.


Thanking you in advance,
Arihant.

On Tue, 22 Jun 2021 at 11:37, Arihant Samar <[email protected]> wrote:

> There was a Jira relating to GPU acceleration where it was mentioned that
> Boolean Scorer has possibilities of GPU usage.
>  So I was just checking first with multithreading in Java itself and
> thought that this function may be amenable to parallelization.
> Hence I was just giving it a try.
> Will this not be useful if there are very long Boolean queries with a lot
> of SHOULD clauses although I have no clue if this is a common situation.
>
> I just need one more little help. Although some of the tests do give the
> error Adrien mentioned that docs should be collected in the same thread
> they were generated, but some tests also give wrong scores itself. Do you
> see anything wrong in the synchronization I have done?
> The synchronization I have done is basically creating an array of
> matching.length size of Reentrant locks and just running the function
> "ScoreWindowIntoBitSetAndReplay " with numScorer threads instead of the for
> loop.
> /// in BooleanScorer.java -> OrCollector -> collect function
> Lock[idx].lock();
> matching[idx] |= 1L << i;
> final Bucket bucket = buckets[i];
> bucket.freq++;
> bucket.score += scorer.score();
> Lock[idx].unlock();
>
>
>
> On Mon, 21 Jun 2021 at 19:04, Adrien Grand <[email protected]> wrote:
>
>> It should be possible to make something like this work. The main issue is
>> that Lucene has the expectation that a (Bulk)Scorer is consumed in the
>> thread where it was pulled, so this would require substantial changes to
>> how BooleanScorer currently operates I believe.
>>
>> I'd be curious to know why you are looking into this rather than passing
>> an Executor to IndexSearcher so that it can search segments concurrently.
>> Is it not providing enough concurrency for you?
>>
>> On Mon, Jun 21, 2021 at 9:46 AM Arihant Samar <[email protected]>
>> wrote:
>>
>>> Hi,
>>> There is a function "ScoreWindowIntoBitSetAndReplay" in
>>> "BooleanScorer.java" which runs over all the scorers.
>>> I was wondering if we can use multi-threading here with numScorers
>>> threads. Anyways we are using a special OrCollector here which updates the
>>> matching array and the score in the buckets of 2048 docs. So we can use a
>>> Reentrant lock for synchronization in the collector.
>>>
>>> I just wanted reviews on this since I tried this and some tests were not
>>> passing. So if you could tell what is wrong in this approach, I
>>> would appreciate it.
>>>
>>> Thanking You in advance,
>>> Arihant.
>>>
>>> On Tue, 15 Jun 2021, 19:05 Adrien Grand, <[email protected]> wrote:
>>>
>>>> Glad it helped. :)
>>>>
>>>> On Tue, Jun 15, 2021 at 3:28 PM Greg Miller <[email protected]> wrote:
>>>>
>>>>> Thanks for this explanation Adrien! I'd been wondering about this a
>>>>> bit myself since seeing that DrillSideways also implements a TAAT approach
>>>>> (in addition to a doc-at-a-time approach). This really helps clear that 
>>>>> up.
>>>>> Appreciate you taking the time to explain!
>>>>>
>>>>> Cheers,
>>>>> -Greg
>>>>>
>>>>> On Mon, Jun 14, 2021 at 2:35 AM Adrien Grand <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hello Arihant,
>>>>>>
>>>>>> The Scorer for disjunctions uses a heap data structure that needs to
>>>>>> be reordered upon every hit. While reordering heaps is efficient as it 
>>>>>> runs
>>>>>> in logarithmic time, the fact that it needs to run on every document 
>>>>>> might
>>>>>> add non-negligible overhead. BooleanScorer tries to work around this
>>>>>> overhead by scoring large windows of documents in a more TAAT
>>>>>> (term-at-a-time) fashion so that Lucene only needs to reorder the heap
>>>>>> every 2048 doc IDs (the hardcoded window size).
>>>>>>
>>>>>> This paper gives a bit more context:
>>>>>> http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf,
>>>>>> see section 4 in particular.
>>>>>>
>>>>>> On Sat, Jun 12, 2021 at 5:47 PM Arihant Samar <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi ,
>>>>>>>
>>>>>>> I am new here . I would like to know what is the exact optimisation
>>>>>>> carried out in “Boolean Scorer.java” code which led to a separate class 
>>>>>>> for
>>>>>>> resolving Boolean Queries in bulk documents. I could not find any 
>>>>>>> material
>>>>>>> in the documentation for this as well, hence I decided to ask here.
>>>>>>>
>>>>>>>
>>>>>>> Thanking you in advance,
>>>>>>>
>>>>>>> Arihant.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
>>>>>>> Windows 10
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Adrien
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Adrien
>>>>
>>>
>>
>> --
>> Adrien
>>
>

Re: Boolean Scorer

Reply via email to