[
https://issues.apache.org/jira/browse/LUCENE-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692837#comment-13692837
]
Adrien Grand commented on LUCENE-5077:
--------------------------------------
I ran again the WIKI_MEDIUM_1M benchmark with various norms formats, and
Lucene42NormsFormat with PackedInts.DEFAULT doesn't look bad:
{noformat}
Default norms format: 1991830 bytes of norms
Lucene42NormsFormat(PackedInts.DEFAULT) 909910 bytes of norms
Task QPS trunk StdDevQPS packed norms StdDev
Pct diff
HighTerm 758.15 (6.4%) 643.01 (7.5%)
-15.2% ( -27% - -1%)
OrHighHigh 296.86 (10.3%) 280.84 (10.6%)
-5.4% ( -23% - 17%)
OrHighMed 218.24 (10.7%) 209.35 (10.9%)
-4.1% ( -23% - 19%)
Fuzzy2 140.18 (4.0%) 135.14 (5.3%)
-3.6% ( -12% - 5%)
MedTerm 1578.99 (7.4%) 1546.60 (4.8%)
-2.1% ( -13% - 10%)
HighPhrase 160.42 (6.6%) 157.22 (4.0%)
-2.0% ( -11% - 9%)
OrHighLow 552.01 (9.9%) 543.15 (10.8%)
-1.6% ( -20% - 21%)
PKLookup 386.15 (5.4%) 382.35 (4.5%)
-1.0% ( -10% - 9%)
MedSpanNear 135.61 (3.5%) 134.41 (4.1%)
-0.9% ( -8% - 7%)
HighSpanNear 10.72 (3.2%) 10.63 (2.2%)
-0.8% ( -6% - 4%)
HighSloppyPhrase 47.29 (4.3%) 47.09 (5.0%)
-0.4% ( -9% - 9%)
LowSpanNear 63.62 (3.4%) 63.83 (4.1%)
0.3% ( -6% - 8%)
Respell 117.48 (4.8%) 118.03 (4.2%)
0.5% ( -8% - 9%)
Wildcard 288.18 (4.0%) 289.88 (4.3%)
0.6% ( -7% - 9%)
AndHighHigh 478.72 (3.7%) 481.87 (3.2%)
0.7% ( -6% - 7%)
Prefix3 1399.57 (3.8%) 1410.64 (6.0%)
0.8% ( -8% - 10%)
MedSloppyPhrase 233.10 (3.8%) 235.37 (4.2%)
1.0% ( -6% - 9%)
AndHighMed 751.65 (3.7%) 759.12 (4.7%)
1.0% ( -7% - 9%)
MedPhrase 119.14 (5.2%) 120.52 (4.7%)
1.2% ( -8% - 11%)
Fuzzy1 142.29 (3.7%) 144.50 (4.5%)
1.6% ( -6% - 10%)
AndHighLow 2365.88 (6.6%) 2407.32 (4.7%)
1.8% ( -8% - 13%)
LowPhrase 256.84 (4.3%) 262.04 (2.6%)
2.0% ( -4% - 9%)
LowSloppyPhrase 313.62 (2.9%) 321.21 (3.5%)
2.4% ( -3% - 9%)
IntNRQ 117.27 (7.1%) 121.22 (11.0%)
3.4% ( -13% - 23%)
LowTerm 2760.64 (4.5%) 2907.64 (6.8%)
5.3% ( -5% - 17%)
Lucene42NormsFormat(PackedInts.DEFAULT) 896406 bytes of norms
Task QPS trunk StdDevQPS packed norms StdDev
Pct diff
HighTerm 698.74 (9.5%) 607.43 (8.0%)
-13.1% ( -27% - 4%)
OrHighHigh 247.01 (6.3%) 216.49 (5.8%)
-12.4% ( -23% - 0%)
OrHighMed 339.84 (6.1%) 301.83 (7.1%)
-11.2% ( -23% - 2%)
OrHighLow 385.26 (5.6%) 342.81 (7.5%)
-11.0% ( -22% - 2%)
MedTerm 1100.36 (10.0%) 983.30 (7.5%)
-10.6% ( -25% - 7%)
HighPhrase 181.74 (8.1%) 176.96 (5.9%)
-2.6% ( -15% - 12%)
Fuzzy1 157.29 (5.1%) 154.49 (4.7%)
-1.8% ( -10% - 8%)
HighSpanNear 34.67 (3.6%) 34.13 (2.5%)
-1.5% ( -7% - 4%)
Prefix3 437.45 (6.1%) 431.17 (6.0%)
-1.4% ( -12% - 11%)
HighSloppyPhrase 5.96 (4.1%) 5.91 (2.7%)
-0.8% ( -7% - 6%)
MedSloppyPhrase 264.84 (4.2%) 262.92 (4.9%)
-0.7% ( -9% - 8%)
Respell 194.30 (5.8%) 192.95 (4.3%)
-0.7% ( -10% - 9%)
MedPhrase 132.99 (5.6%) 132.37 (5.2%)
-0.5% ( -10% - 10%)
Wildcard 235.47 (4.8%) 235.00 (4.5%)
-0.2% ( -9% - 9%)
AndHighHigh 338.04 (3.3%) 337.96 (2.4%)
-0.0% ( -5% - 5%)
LowPhrase 353.22 (6.9%) 353.80 (5.3%)
0.2% ( -11% - 13%)
LowSpanNear 79.68 (3.6%) 79.98 (4.5%)
0.4% ( -7% - 8%)
Fuzzy2 79.15 (6.6%) 79.49 (5.6%)
0.4% ( -11% - 13%)
PKLookup 387.23 (6.7%) 389.36 (4.5%)
0.5% ( -10% - 12%)
LowSloppyPhrase 649.88 (2.7%) 655.05 (4.2%)
0.8% ( -5% - 7%)
IntNRQ 191.57 (7.7%) 195.08 (9.8%)
1.8% ( -14% - 20%)
AndHighLow 2025.29 (7.1%) 2065.03 (6.4%)
2.0% ( -10% - 16%)
MedSpanNear 415.85 (4.5%) 426.71 (4.0%)
2.6% ( -5% - 11%)
AndHighMed 956.96 (5.4%) 990.30 (6.6%)
3.5% ( -8% - 16%)
LowTerm 2644.68 (7.4%) 2745.68 (8.1%)
3.8% ( -10% - 20%)
DiskNormsFormat (same as DiskDVF but for norms): 896314 bytes of norms
Task QPS trunk StdDevQPS packed norms StdDev
Pct diff
HighTerm 359.42 (12.9%) 204.00 (2.5%)
-43.2% ( -51% - -32%)
OrHighHigh 269.86 (7.4%) 177.72 (4.1%)
-34.1% ( -42% - -24%)
OrHighLow 358.36 (8.1%) 238.59 (4.1%)
-33.4% ( -42% - -23%)
OrHighMed 305.65 (8.6%) 207.21 (4.7%)
-32.2% ( -41% - -20%)
MedTerm 1342.66 (9.2%) 913.30 (3.4%)
-32.0% ( -40% - -21%)
LowTerm 2849.62 (10.9%) 2449.59 (5.4%)
-14.0% ( -27% - 2%)
AndHighHigh 278.22 (3.8%) 249.40 (2.4%)
-10.4% ( -15% - -4%)
HighPhrase 141.20 (6.5%) 131.19 (4.3%)
-7.1% ( -16% - 3%)
AndHighMed 410.39 (3.5%) 399.99 (3.1%)
-2.5% ( -8% - 4%)
HighSpanNear 42.28 (2.7%) 41.21 (2.8%)
-2.5% ( -7% - 3%)
AndHighLow 1932.50 (8.4%) 1895.71 (8.0%)
-1.9% ( -16% - 15%)
Fuzzy1 171.83 (4.0%) 168.69 (4.3%)
-1.8% ( -9% - 6%)
Fuzzy2 47.29 (4.1%) 46.75 (3.1%)
-1.1% ( -7% - 6%)
Wildcard 441.76 (4.8%) 437.28 (4.8%)
-1.0% ( -10% - 8%)
Respell 133.99 (3.7%) 132.66 (2.8%)
-1.0% ( -7% - 5%)
IntNRQ 125.99 (8.7%) 125.24 (7.5%)
-0.6% ( -15% - 17%)
MedSpanNear 107.53 (3.2%) 107.04 (4.9%)
-0.5% ( -8% - 7%)
Prefix3 570.56 (4.7%) 568.06 (4.9%)
-0.4% ( -9% - 9%)
MedSloppyPhrase 247.61 (4.4%) 249.33 (3.6%)
0.7% ( -7% - 9%)
LowPhrase 223.67 (3.7%) 225.77 (3.9%)
0.9% ( -6% - 8%)
HighSloppyPhrase 46.13 (4.8%) 46.68 (5.9%)
1.2% ( -9% - 12%)
PKLookup 381.14 (2.5%) 385.72 (4.3%)
1.2% ( -5% - 8%)
LowSpanNear 109.87 (3.6%) 111.83 (4.7%)
1.8% ( -6% - 10%)
LowSloppyPhrase 179.23 (3.3%) 184.36 (4.2%)
2.9% ( -4% - 10%)
MedPhrase 202.33 (3.0%) 208.91 (4.0%)
3.3% ( -3% - 10%)
{noformat}
> make it easier to use compressed norms
> --------------------------------------
>
> Key: LUCENE-5077
> URL: https://issues.apache.org/jira/browse/LUCENE-5077
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Fix For: 5.0, 4.4
>
> Attachments: LUCENE-5077.patch
>
>
> Lucene42DVConsumer's ctor takes acceptableOverheadRatio, so that you can
> tradeoff time/space, and we pass PackedInts.FASTEST so we always use 8 bits
> per value.
> But the class is package private, so if I want to make my own NormsFormat and
> pass e.g. PackedInts.COMPACT, I can't ... I think we should make this class
> public / @experimental?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]