I managed to correct some mistakes and now the tests which checks scores are passing. Obviously the tests which check about the same thread generating and collecting fail , but just out of interest I removed those asserts. Are there any tests or benchmarks which I can compare how these changes perform.
Thanking you in advance, Arihant. On Tue, 22 Jun 2021 at 11:37, Arihant Samar <[email protected]> wrote: > There was a Jira relating to GPU acceleration where it was mentioned that > Boolean Scorer has possibilities of GPU usage. > So I was just checking first with multithreading in Java itself and > thought that this function may be amenable to parallelization. > Hence I was just giving it a try. > Will this not be useful if there are very long Boolean queries with a lot > of SHOULD clauses although I have no clue if this is a common situation. > > I just need one more little help. Although some of the tests do give the > error Adrien mentioned that docs should be collected in the same thread > they were generated, but some tests also give wrong scores itself. Do you > see anything wrong in the synchronization I have done? > The synchronization I have done is basically creating an array of > matching.length size of Reentrant locks and just running the function > "ScoreWindowIntoBitSetAndReplay " with numScorer threads instead of the for > loop. > /// in BooleanScorer.java -> OrCollector -> collect function > Lock[idx].lock(); > matching[idx] |= 1L << i; > final Bucket bucket = buckets[i]; > bucket.freq++; > bucket.score += scorer.score(); > Lock[idx].unlock(); > > > > On Mon, 21 Jun 2021 at 19:04, Adrien Grand <[email protected]> wrote: > >> It should be possible to make something like this work. The main issue is >> that Lucene has the expectation that a (Bulk)Scorer is consumed in the >> thread where it was pulled, so this would require substantial changes to >> how BooleanScorer currently operates I believe. >> >> I'd be curious to know why you are looking into this rather than passing >> an Executor to IndexSearcher so that it can search segments concurrently. >> Is it not providing enough concurrency for you? >> >> On Mon, Jun 21, 2021 at 9:46 AM Arihant Samar <[email protected]> >> wrote: >> >>> Hi, >>> There is a function "ScoreWindowIntoBitSetAndReplay" in >>> "BooleanScorer.java" which runs over all the scorers. >>> I was wondering if we can use multi-threading here with numScorers >>> threads. Anyways we are using a special OrCollector here which updates the >>> matching array and the score in the buckets of 2048 docs. So we can use a >>> Reentrant lock for synchronization in the collector. >>> >>> I just wanted reviews on this since I tried this and some tests were not >>> passing. So if you could tell what is wrong in this approach, I >>> would appreciate it. >>> >>> Thanking You in advance, >>> Arihant. >>> >>> On Tue, 15 Jun 2021, 19:05 Adrien Grand, <[email protected]> wrote: >>> >>>> Glad it helped. :) >>>> >>>> On Tue, Jun 15, 2021 at 3:28 PM Greg Miller <[email protected]> wrote: >>>> >>>>> Thanks for this explanation Adrien! I'd been wondering about this a >>>>> bit myself since seeing that DrillSideways also implements a TAAT approach >>>>> (in addition to a doc-at-a-time approach). This really helps clear that >>>>> up. >>>>> Appreciate you taking the time to explain! >>>>> >>>>> Cheers, >>>>> -Greg >>>>> >>>>> On Mon, Jun 14, 2021 at 2:35 AM Adrien Grand <[email protected]> >>>>> wrote: >>>>> >>>>>> Hello Arihant, >>>>>> >>>>>> The Scorer for disjunctions uses a heap data structure that needs to >>>>>> be reordered upon every hit. While reordering heaps is efficient as it >>>>>> runs >>>>>> in logarithmic time, the fact that it needs to run on every document >>>>>> might >>>>>> add non-negligible overhead. BooleanScorer tries to work around this >>>>>> overhead by scoring large windows of documents in a more TAAT >>>>>> (term-at-a-time) fashion so that Lucene only needs to reorder the heap >>>>>> every 2048 doc IDs (the hardcoded window size). >>>>>> >>>>>> This paper gives a bit more context: >>>>>> http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf, >>>>>> see section 4 in particular. >>>>>> >>>>>> On Sat, Jun 12, 2021 at 5:47 PM Arihant Samar <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi , >>>>>>> >>>>>>> I am new here . I would like to know what is the exact optimisation >>>>>>> carried out in “Boolean Scorer.java” code which led to a separate class >>>>>>> for >>>>>>> resolving Boolean Queries in bulk documents. I could not find any >>>>>>> material >>>>>>> in the documentation for this as well, hence I decided to ask here. >>>>>>> >>>>>>> >>>>>>> Thanking you in advance, >>>>>>> >>>>>>> Arihant. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for >>>>>>> Windows 10 >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Adrien >>>>>> >>>>> >>>> >>>> -- >>>> Adrien >>>> >>> >> >> -- >> Adrien >> >
