jpountz commented on issue #12665:
URL: https://github.com/apache/lucene/issues/12665#issuecomment-1770827026
I did a first indexing run on wikibigall with the following merge policy,
which I tried to make as lightweight as possible:
```
BPIndexReorderer reorderer = new BPIndexReorderer();
reorderer.setMinDocFreq(16384);
reorderer.setMaxIters(3);
reorderer.setMinPartitionSize(8192);
mp = new BPReorderingMergePolicy(mp, reorderer, 131072);
```
Indexing ran in 3402170 msec vs. 2610068 msec without reordering, ie. 30%
slower. (This is when running with default params, ie. maxBufferedDocs=12119,
SerialMergeScheduler, LogDocMergePolicy (wrapped within
BPReorderingMergePolicy), etc.) Search was noticeably faster:
```
TaskQPS baseline StdDevQPS
my_modified_version StdDev Pct diff p-value
PKLookup 279.70 (1.6%) 247.15
(2.8%) -11.6% ( -15% - -7%) 0.000
HighTerm 441.90 (7.8%) 414.88
(4.4%) -6.1% ( -17% - 6%) 0.002
CountOrHighMed 94.53 (16.3%) 92.77
(15.9%) -1.9% ( -29% - 36%) 0.715
CountOrHighHigh 60.96 (16.4%) 60.03
(16.4%) -1.5% ( -29% - 37%) 0.768
HighTermMonthSort 4314.65 (2.3%) 4274.31
(2.8%) -0.9% ( -5% - 4%) 0.248
Respell 64.51 (1.1%) 64.38
(1.6%) -0.2% ( -2% - 2%) 0.654
CountPhrase 3.54 (11.2%) 3.59
(8.2%) 1.2% ( -16% - 23%) 0.700
Wildcard 71.79 (2.6%) 72.66
(2.7%) 1.2% ( -3% - 6%) 0.143
Fuzzy2 89.57 (0.9%) 90.80
(1.2%) 1.4% ( 0% - 3%) 0.000
Prefix3 123.64 (3.4%) 125.34
(2.8%) 1.4% ( -4% - 7%) 0.159
CountTerm 14193.65 (3.6%) 14589.84
(2.9%) 2.8% ( -3% - 9%) 0.007
IntNRQ 289.67 (6.0%) 299.22
(5.6%) 3.3% ( -7% - 15%) 0.074
HighPhrase 5.95 (7.6%) 6.16
(9.3%) 3.6% ( -12% - 22%) 0.180
Fuzzy1 104.33 (0.9%) 108.08
(1.2%) 3.6% ( 1% - 5%) 0.000
LowPhrase 17.70 (3.3%) 18.74
(4.9%) 5.9% ( -2% - 14%) 0.000
MedTerm 533.08 (7.8%) 568.40
(4.5%) 6.6% ( -5% - 20%) 0.001
OrHighHigh 56.43 (5.8%) 60.45
(7.0%) 7.1% ( -5% - 21%) 0.000
CountAndHighMed 124.71 (3.2%) 136.61
(4.5%) 9.5% ( 1% - 17%) 0.000
OrHighMed 212.88 (4.0%) 233.35
(5.0%) 9.6% ( 0% - 19%) 0.000
OrHighLow 604.12 (2.8%) 676.18
(4.5%) 11.9% ( 4% - 19%) 0.000
AndHighLow 933.07 (2.3%) 1046.85
(2.7%) 12.2% ( 7% - 17%) 0.000
LowTerm 947.45 (6.1%) 1091.11
(4.7%) 15.2% ( 4% - 27%) 0.000
AndHighMed 197.62 (3.1%) 232.11
(3.1%) 17.5% ( 10% - 24%) 0.000
MedPhrase 42.93 (2.7%) 50.47
(5.4%) 17.6% ( 9% - 26%) 0.000
AndHighHigh 52.74 (4.0%) 63.41
(4.6%) 20.2% ( 11% - 30%) 0.000
CountAndHighHigh 41.69 (3.5%) 50.60
(5.6%) 21.4% ( 11% - 31%) 0.000
HighTermDayOfYearSort 444.76 (1.7%) 652.11
(2.0%) 46.6% ( 42% - 51%) 0.000
```
I'll look into whether I can reduce the merge-time overhead.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]