[
https://issues.apache.org/jira/browse/LUCENE-7979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrien Grand updated LUCENE-7979:
---------------------------------
Attachment: LUCENE-7979.patch
Here is a patch. It does not pass all tests as eg. the new priority queue does
not work exactly as MinShouldMatchSumScorer expects but it should be enough for
benchmarking.
I tried wikimedium10m on the following tasks file, bulk scoring is disabled:
{noformat}
OrHighHigh: several following # freq=436129 freq=416515
OrHighHigh: publisher end # freq=1289029 freq=526636
OrHighHigh: 2009 film # freq=887702 freq=432758
OrHighHigh: http known # freq=3493581 freq=607158
OrHighHigh: south county # freq=560468 freq=521126
OrHighMed: international chris # freq=418261 freq=85523
OrHighMed: right million # freq=630423 freq=175554
OrHighMed: known created # freq=607158 freq=220831
OrHighMed: its universal # freq=1173450 freq=47078
OrHighMed: 9 network # freq=574434 freq=164997
OrHighLow: 2005 valois # freq=835460 freq=2277
OrHighLow: until universalist # freq=425389 freq=1230
OrHighLow: made forays # freq=742313 freq=799
OrHighLow: do bush's # freq=511178 freq=2681
OrHighLow: 10 racedetail.html # freq=918339 freq=870
Or5High5Med5Low: several publisher 2009 http south chris million created
universal network valois universalist forays bush's racedetail.html
Or5High5Med5Low: id title s called 2 reform face draft summary 1923 weed
violently cantrell 10.1371 veneration
Or128Med: second june several october december july high because 20 general
government m books him language february end august list issue same often area
november 15 county international 2000 2004 times u.s although based small
british group like each series film 18 place now against death her until pp 25
j great west major ii 13 london 14 long e 16 30 us 2003 center large day
citation references could x d example population b even another style found do
2012 n 2002 what form those 2001 br public four 17 22 much following east 24
very needed article modern 19 country around f french v according old king
within include still did jpg set music doi 21 age power family external using
links order own house home german
Or128Med: common r different non among 23 due science class reflist 28 27
political 26 ndash line way military law william kingdom 1999 development she
company back central 29 en began period story without england president link
original zh category roman short europe party white further image david though
given h along human top society ja france school james 01 make 1998 best pdf
late point robert man named service research information term local european
led w western members present union convert la published important 1997 various
popular l off former america text official control water considered uk black
third river near five become army just usually established single how said
result george down st others edition retrieved 02 land 1996 church support air
full few 03 free less
{noformat}
{noformat}
TaskQPS baseline StdDev QPS patch StdDev
Pct diff
OrHighMed 27.75 (2.0%) 19.46 (0.6%)
-29.9% ( -31% - -27%)
OrHighLow 45.52 (2.5%) 32.28 (0.4%)
-29.1% ( -31% - -26%)
OrHighHigh 34.59 (1.5%) 25.91 (0.5%)
-25.1% ( -26% - -23%)
Or5High5Med5Low 3.08 (1.6%) 2.84 (0.6%)
-7.9% ( -9% - -5%)
Or128Med 0.24 (1.8%) 0.27 (0.5%)
12.8% ( 10% - 15%)
{noformat}
This matches my intuition that the radix heap performs better when there are
many terms, but the threshold looks quite high: even with 15 terms the regular
binary heap still performs better.
Maybe there are ways we could make it perform better for common numbers of
terms in a disjunction?
> Move disjunctions to a radix heap
> ---------------------------------
>
> Key: LUCENE-7979
> URL: https://issues.apache.org/jira/browse/LUCENE-7979
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Trivial
> Attachments: LUCENE-7979.patch
>
>
> An Elasticsearch user argued that we should look into using radix heaps in
> order to run disjunctions so I wanted to give it a try. I'm creating this
> issue to share findings. Spoiler: so far it does not seem to help but maybe
> I'm just doing it wrong?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]