[
https://issues.apache.org/jira/browse/LUCENE-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14187265#comment-14187265
]
Ryan Ernst commented on LUCENE-6030:
------------------------------------
I've done some performance tests with luceneutil and the numbers are ok, but
not great. Hotspot seems to get confused sometimes, leading to a qps decline.
On java7, using wikimedium10m:
{noformat}
TaskQPS baseline StdDev QPS patch StdDev
Pct diff
OrNotHighMed 57.06 (8.9%) 49.56 (5.4%)
-13.2% ( -25% - 1%)
OrNotHighLow 121.53 (8.8%) 106.02 (5.7%)
-12.8% ( -25% - 1%)
OrNotHighHigh 58.92 (8.9%) 51.42 (5.4%)
-12.7% ( -24% - 1%)
OrHighNotHigh 68.08 (8.9%) 59.50 (5.5%)
-12.6% ( -24% - 1%)
OrHighHigh 25.97 (8.7%) 22.73 (5.3%)
-12.5% ( -24% - 1%)
OrHighNotLow 90.21 (8.8%) 80.24 (6.2%)
-11.1% ( -23% - 4%)
HighTerm 126.83 (1.8%) 112.85 (1.9%)
-11.0% ( -14% - -7%)
OrHighLow 104.86 (8.8%) 93.32 (5.9%)
-11.0% ( -23% - 4%)
OrHighNotMed 109.46 (8.3%) 100.87 (6.0%)
-7.8% ( -20% - 7%)
MedTerm 200.05 (1.7%) 187.49 (1.8%)
-6.3% ( -9% - -2%)
OrHighMed 118.77 (8.0%) 113.79 (6.3%)
-4.2% ( -17% - 10%)
Prefix3 82.16 (3.1%) 81.47 (4.4%)
-0.8% ( -8% - 6%)
HighSpanNear 14.16 (3.8%) 14.05 (4.1%)
-0.8% ( -8% - 7%)
IntNRQ 11.53 (4.9%) 11.44 (6.4%)
-0.8% ( -11% - 11%)
HighPhrase 3.70 (14.2%) 3.67 (14.2%)
-0.7% ( -25% - 32%)
HighSloppyPhrase 4.46 (6.7%) 4.43 (6.1%)
-0.7% ( -12% - 12%)
Fuzzy2 81.39 (2.5%) 81.43 (2.4%)
0.0% ( -4% - 5%)
AndHighLow 1104.54 (1.7%) 1105.90 (3.0%)
0.1% ( -4% - 4%)
Wildcard 42.71 (3.9%) 42.76 (3.6%)
0.1% ( -7% - 7%)
Respell 74.16 (2.4%) 74.33 (1.9%)
0.2% ( -3% - 4%)
MedSpanNear 24.58 (3.3%) 24.69 (3.3%)
0.5% ( -5% - 7%)
LowPhrase 44.89 (2.1%) 45.17 (2.3%)
0.6% ( -3% - 5%)
Fuzzy1 98.83 (2.5%) 99.49 (2.5%)
0.7% ( -4% - 5%)
MedPhrase 107.99 (6.0%) 109.06 (6.0%)
1.0% ( -10% - 13%)
MedSloppyPhrase 19.96 (3.0%) 20.24 (3.3%)
1.4% ( -4% - 8%)
LowSpanNear 37.75 (3.4%) 38.38 (3.5%)
1.7% ( -5% - 8%)
LowSloppyPhrase 31.39 (2.8%) 31.98 (3.2%)
1.9% ( -4% - 8%)
AndHighHigh 62.62 (1.0%) 64.48 (1.6%)
3.0% ( 0% - 5%)
AndHighMed 187.48 (1.0%) 193.88 (1.6%)
3.4% ( 0% - 6%)
LowTerm 772.23 (2.9%) 970.78 (6.8%)
25.7% ( 15% - 36%)
{noformat}
On java 8, the decline is less pronounced:
{noformat}
TaskQPS baseline StdDev QPS patch StdDev
Pct diff
HighTerm 107.28 (4.2%) 92.63 (3.0%)
-13.7% ( -19% - -6%)
OrNotHighLow 103.14 (10.2%) 94.37 (4.9%)
-8.5% ( -21% - 7%)
OrNotHighMed 103.75 (10.8%) 95.47 (5.3%)
-8.0% ( -21% - 9%)
OrNotHighHigh 39.62 (11.9%) 36.56 (6.3%)
-7.7% ( -23% - 11%)
OrHighNotHigh 31.88 (12.9%) 29.51 (7.1%)
-7.4% ( -24% - 14%)
OrHighHigh 26.44 (13.6%) 24.59 (7.9%)
-7.0% ( -25% - 16%)
OrHighLow 74.93 (14.5%) 70.41 (8.7%)
-6.0% ( -25% - 20%)
OrHighNotLow 106.31 (14.0%) 101.20 (8.7%)
-4.8% ( -24% - 20%)
OrHighNotMed 59.98 (13.5%) 57.84 (8.5%)
-3.6% ( -22% - 21%)
HighPhrase 78.65 (5.1%) 76.22 (4.5%)
-3.1% ( -12% - 6%)
HighSloppyPhrase 18.62 (6.5%) 18.32 (4.7%)
-1.6% ( -12% - 10%)
OrHighMed 79.70 (13.3%) 78.73 (9.0%)
-1.2% ( -20% - 24%)
MedPhrase 26.06 (3.4%) 25.94 (3.1%)
-0.5% ( -6% - 6%)
Fuzzy2 114.17 (3.4%) 113.86 (3.5%)
-0.3% ( -6% - 6%)
HighSpanNear 27.20 (6.2%) 27.21 (5.0%)
0.0% ( -10% - 11%)
LowPhrase 36.88 (2.1%) 36.95 (2.1%)
0.2% ( -4% - 4%)
Fuzzy1 136.96 (3.2%) 137.26 (3.5%)
0.2% ( -6% - 7%)
AndHighLow 1517.11 (4.2%) 1523.95 (4.1%)
0.5% ( -7% - 9%)
Respell 87.37 (2.8%) 87.85 (2.6%)
0.5% ( -4% - 6%)
LowSloppyPhrase 63.60 (4.2%) 64.10 (3.5%)
0.8% ( -6% - 8%)
Wildcard 20.92 (4.7%) 21.09 (3.2%)
0.8% ( -6% - 9%)
MedTerm 359.22 (3.1%) 362.24 (3.0%)
0.8% ( -5% - 7%)
MedSpanNear 14.74 (4.5%) 14.90 (4.3%)
1.0% ( -7% - 10%)
Prefix3 51.84 (6.8%) 52.41 (5.0%)
1.1% ( -9% - 13%)
IntNRQ 12.60 (8.0%) 12.79 (5.8%)
1.5% ( -11% - 16%)
AndHighMed 338.81 (1.5%) 345.34 (1.5%)
1.9% ( -1% - 5%)
MedSloppyPhrase 60.72 (6.1%) 61.97 (5.1%)
2.1% ( -8% - 14%)
AndHighHigh 77.59 (1.4%) 80.17 (1.4%)
3.3% ( 0% - 6%)
LowSpanNear 215.18 (5.4%) 223.41 (4.4%)
3.8% ( -5% - 14%)
LowTerm 1043.18 (5.0%) 1123.42 (5.9%)
7.7% ( -2% - 19%)
{noformat}
However, this has a huge size impact. For the wikimedium10m, the size of norms
was reduced by about half:
{noformat}
rjernst@codex:~/code/ls-util$ du -cksh
indices/wikimedium10m.trunk.Lucene50.nd10M/index/*.nvd
1.8M indices/wikimedium10m.trunk.Lucene50.nd10M/index/_32.nvd
1.8M indices/wikimedium10m.trunk.Lucene50.nd10M/index/_65.nvd
1.8M indices/wikimedium10m.trunk.Lucene50.nd10M/index/_98.nvd
1.8M indices/wikimedium10m.trunk.Lucene50.nd10M/index/_cb.nvd
1.8M indices/wikimedium10m.trunk.Lucene50.nd10M/index/_fe.nvd
180K indices/wikimedium10m.trunk.Lucene50.nd10M/index/_fp.nvd
180K indices/wikimedium10m.trunk.Lucene50.nd10M/index/_g0.nvd
180K indices/wikimedium10m.trunk.Lucene50.nd10M/index/_gb.nvd
92K indices/wikimedium10m.trunk.Lucene50.nd10M/index/_gm.nvd
180K indices/wikimedium10m.trunk.Lucene50.nd10M/index/_gx.nvd
20K indices/wikimedium10m.trunk.Lucene50.nd10M/index/_gy.nvd
12K indices/wikimedium10m.trunk.Lucene50.nd10M/index/_gz.nvd
12K indices/wikimedium10m.trunk.Lucene50.nd10M/index/_h0.nvd
12K indices/wikimedium10m.trunk.Lucene50.nd10M/index/_h1.nvd
12K indices/wikimedium10m.trunk.Lucene50.nd10M/index/_h2.nvd
4.0K indices/wikimedium10m.trunk.Lucene50.nd10M/index/_h3.nvd
9.5M total
du -cksh indices/wikimedium10m.patch.Lucene50.nd10M/index/*.nvd
880K indices/wikimedium10m.patch.Lucene50.nd10M/index/_32.nvd
880K indices/wikimedium10m.patch.Lucene50.nd10M/index/_65.nvd
880K indices/wikimedium10m.patch.Lucene50.nd10M/index/_98.nvd
880K indices/wikimedium10m.patch.Lucene50.nd10M/index/_cb.nvd
880K indices/wikimedium10m.patch.Lucene50.nd10M/index/_fe.nvd
92K indices/wikimedium10m.patch.Lucene50.nd10M/index/_fp.nvd
92K indices/wikimedium10m.patch.Lucene50.nd10M/index/_g0.nvd
92K indices/wikimedium10m.patch.Lucene50.nd10M/index/_gb.nvd
92K indices/wikimedium10m.patch.Lucene50.nd10M/index/_gm.nvd
92K indices/wikimedium10m.patch.Lucene50.nd10M/index/_gx.nvd
12K indices/wikimedium10m.patch.Lucene50.nd10M/index/_gy.nvd
12K indices/wikimedium10m.patch.Lucene50.nd10M/index/_gz.nvd
12K indices/wikimedium10m.patch.Lucene50.nd10M/index/_h0.nvd
12K indices/wikimedium10m.patch.Lucene50.nd10M/index/_h1.nvd
12K indices/wikimedium10m.patch.Lucene50.nd10M/index/_h2.nvd
4.0K indices/wikimedium10m.patch.Lucene50.nd10M/index/_h3.nvd
4.9M total
{noformat}
> Add norms patched compression which uses table for most common values
> ---------------------------------------------------------------------
>
> Key: LUCENE-6030
> URL: https://issues.apache.org/jira/browse/LUCENE-6030
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Ryan Ernst
> Attachments: LUCENE-6030.patch
>
>
> We have added the PATCHED norms sub format in lucene 50, which uses a bitset
> to mark documents that have the most common value (when >97% of the documents
> have that value). This works well for fields that have a predominant value
> length, and then a small number of docs with some other random values. But
> another common case is having a handful of very common value lengths, like
> with a title field.
> We can use a table (see TABLE_COMPRESSION) to store the most common values,
> and save an oridinal for the "other" case, at which point we can lookup in
> the secondary patch table.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]