[ https://issues.apache.org/jira/browse/LUCENE-6863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adrien Grand updated LUCENE-6863: --------------------------------- Attachment: LUCENE-6863.patch Updated patch that: - makes the code a bit more readable and adds comments - avoids loading a slice for values when only docs with field are requested - saves some monotonic lookups Here is an updated result of the benchmark (still with a threshold of 5% for benchmarking purposes, even though the patch still has a threshold of 1%), computed exactly the same way as above. It makes the slowdown a bit more contained. Times are in ms. ||Field||sort performance on a MatchAllDocsQuery||sort performance on a term query that matches 10% of docs||sort performance on a term query that matches 1% of docs||sort performance on a term query that matches docs that have the field|| |cc2 |128→99 ({color:green}-23%{color})|21.8→23.8 (+9%)|2.92→4.33 ({color:red}+48%{color})|6.84→13.0 ({color:red}+90%{color})| |admin4|121→98 ({color:green}-19%{color})|21.4→21.1 (-1%)| 3.65→2.81 ({color:green}-23%{color})|8.36→16.6 ({color:red}+98%{color})| |admin3|116→125 (+1%)|20.6→20.0 (-3%)|3.20→3.24 (+1%)|18.9→19.4 (+8%)| |admin2 |124→132 (+6%)|21.5→20.6 (-4%)|3.30→3.49 (+6%)|8.58→8.64 (+1%)| I think the change is good to go, but I know this can be controversial. Please let me know if you have concerns, otherwise I plan to commit it by the end of the week. > Store sparse doc values more efficiently > ---------------------------------------- > > Key: LUCENE-6863 > URL: https://issues.apache.org/jira/browse/LUCENE-6863 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Assignee: Adrien Grand > Attachments: LUCENE-6863.patch, LUCENE-6863.patch, LUCENE-6863.patch > > > For both NUMERIC fields and ordinals of SORTED fields, we store data in a > dense way. As a consequence, if you have only 1000 documents out of 1B that > have a value, and 8 bits are required to store those 1000 numbers, we will > not require 1KB of storage, but 1GB. > I suspect this mostly happens in abuse cases, but still it's a pity that we > explode storage requirements. We could try to detect sparsity and compress > accordingly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org