Glad it helped. :) On Tue, Jun 15, 2021 at 3:28 PM Greg Miller <gsmil...@gmail.com> wrote:
> Thanks for this explanation Adrien! I'd been wondering about this a bit > myself since seeing that DrillSideways also implements a TAAT approach (in > addition to a doc-at-a-time approach). This really helps clear that up. > Appreciate you taking the time to explain! > > Cheers, > -Greg > > On Mon, Jun 14, 2021 at 2:35 AM Adrien Grand <jpou...@gmail.com> wrote: > >> Hello Arihant, >> >> The Scorer for disjunctions uses a heap data structure that needs to be >> reordered upon every hit. While reordering heaps is efficient as it runs in >> logarithmic time, the fact that it needs to run on every document might add >> non-negligible overhead. BooleanScorer tries to work around this overhead >> by scoring large windows of documents in a more TAAT (term-at-a-time) >> fashion so that Lucene only needs to reorder the heap every 2048 doc IDs >> (the hardcoded window size). >> >> This paper gives a bit more context: >> http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf, >> see section 4 in particular. >> >> On Sat, Jun 12, 2021 at 5:47 PM Arihant Samar <arisam...@gmail.com> >> wrote: >> >>> Hi , >>> >>> I am new here . I would like to know what is the exact optimisation >>> carried out in “Boolean Scorer.java” code which led to a separate class for >>> resolving Boolean Queries in bulk documents. I could not find any material >>> in the documentation for this as well, hence I decided to ask here. >>> >>> >>> Thanking you in advance, >>> >>> Arihant. >>> >>> >>> >>> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for >>> Windows 10 >>> >>> >>> >> >> >> -- >> Adrien >> > -- Adrien