[
https://issues.apache.org/jira/browse/LUCENE-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrien Grand updated LUCENE-7839:
---------------------------------
Attachment: LUCENE-7839.patch
I tried to leverage the iterator API similarly to what numeric doc values do,
but luceneutil seems to notice a performance hit:
{noformat}
TaskQPS baseline StdDev QPS patch StdDev
Pct diff
HighTerm 569.71 (11.5%) 490.35 (9.0%)
-13.9% ( -30% - 7%)
OrHighHigh 138.08 (11.6%) 123.27 (7.1%)
-10.7% ( -26% - 9%)
OrHighMed 295.37 (11.2%) 269.99 (8.1%)
-8.6% ( -25% - 12%)
OrHighLow 379.17 (9.1%) 351.63 (6.4%)
-7.3% ( -20% - 9%)
MedTerm 1518.29 (11.9%) 1421.77 (6.8%)
-6.4% ( -22% - 14%)
AndHighHigh 386.22 (9.3%) 367.76 (9.0%)
-4.8% ( -21% - 14%)
LowTerm 3236.73 (8.3%) 3118.34 (8.3%)
-3.7% ( -18% - 14%)
MedSloppyPhrase 555.94 (9.6%) 537.02 (6.3%)
-3.4% ( -17% - 13%)
HighTermDayOfYearSort 330.62 (12.2%) 320.20 (9.8%)
-3.2% ( -22% - 21%)
MedPhrase 635.77 (9.6%) 616.12 (8.1%)
-3.1% ( -18% - 16%)
HighSloppyPhrase 147.02 (8.6%) 142.77 (7.9%)
-2.9% ( -17% - 14%)
IntNRQ 117.56 (9.8%) 114.43 (10.2%)
-2.7% ( -20% - 19%)
HighSpanNear 57.73 (7.9%) 56.21 (7.4%)
-2.6% ( -16% - 13%)
LowSloppyPhrase 385.52 (8.9%) 375.39 (6.5%)
-2.6% ( -16% - 13%)
LowPhrase 653.67 (9.7%) 637.17 (7.4%)
-2.5% ( -17% - 16%)
Prefix3 287.63 (12.3%) 281.78 (10.3%)
-2.0% ( -21% - 23%)
Respell 144.41 (7.8%) 141.67 (6.7%)
-1.9% ( -15% - 13%)
AndHighMed 676.46 (8.3%) 665.05 (9.8%)
-1.7% ( -18% - 17%)
Wildcard 214.90 (8.5%) 211.57 (7.0%)
-1.5% ( -15% - 15%)
HighPhrase 20.11 (9.7%) 20.03 (8.5%)
-0.4% ( -17% - 19%)
MedSpanNear 476.40 (8.7%) 476.48 (7.7%)
0.0% ( -15% - 18%)
AndHighLow 964.81 (9.8%) 965.18 (8.0%)
0.0% ( -16% - 19%)
HighTermMonthSort 1190.72 (9.6%) 1194.44 (11.4%)
0.3% ( -18% - 23%)
LowSpanNear 421.27 (7.8%) 423.97 (9.9%)
0.6% ( -15% - 19%)
Fuzzy2 49.17 (16.2%) 50.09 (19.1%)
1.9% ( -28% - 44%)
Fuzzy1 129.89 (12.6%) 132.32 (11.9%)
1.9% ( -20% - 30%)
{noformat}
You can find the patch that I played with attached. It keeps the current levels
of compression, but just splits values into blocks of 2^14 values and decides
on the number of bits on a per-block basis. Maybe there is a better way to do
this...
> Optimize the default NormsFormat for the case that all norms are in 0..16
> -------------------------------------------------------------------------
>
> Key: LUCENE-7839
> URL: https://issues.apache.org/jira/browse/LUCENE-7839
> Project: Lucene - Core
> Issue Type: Task
> Reporter: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-7839.patch
>
>
> Given how we now store the length of the field in norms, we could optimize
> the default norms format for the case that all norms are in 0..16 and store
> it on 4 bits. This would be picked up for short fields that have less than 16
> terms (eg. title fields) and reduce disk utilization by 2.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]