Thanks for this explanation Adrien! I'd been wondering about this a bit myself since seeing that DrillSideways also implements a TAAT approach (in addition to a doc-at-a-time approach). This really helps clear that up. Appreciate you taking the time to explain!
Cheers, -Greg On Mon, Jun 14, 2021 at 2:35 AM Adrien Grand <[email protected]> wrote: > Hello Arihant, > > The Scorer for disjunctions uses a heap data structure that needs to be > reordered upon every hit. While reordering heaps is efficient as it runs in > logarithmic time, the fact that it needs to run on every document might add > non-negligible overhead. BooleanScorer tries to work around this overhead > by scoring large windows of documents in a more TAAT (term-at-a-time) > fashion so that Lucene only needs to reorder the heap every 2048 doc IDs > (the hardcoded window size). > > This paper gives a bit more context: > http://www.savar.se/media/1181/space_optimizations_for_total_ranking.pdf, > see section 4 in particular. > > On Sat, Jun 12, 2021 at 5:47 PM Arihant Samar <[email protected]> wrote: > >> Hi , >> >> I am new here . I would like to know what is the exact optimisation >> carried out in “Boolean Scorer.java” code which led to a separate class for >> resolving Boolean Queries in bulk documents. I could not find any material >> in the documentation for this as well, hence I decided to ask here. >> >> >> Thanking you in advance, >> >> Arihant. >> >> >> >> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for >> Windows 10 >> >> >> > > > -- > Adrien >
