[
https://issues.apache.org/jira/browse/LUCENE-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Feng Guo updated LUCENE-10315:
------------------------------
Description:
This issue tried to use a 512 ints {{ForUtil}} for BKD ids codec. I benchmarked
this optimization by mocking some random LongPoint and querying them with
PointInSetQuery.
*Benchmark Result*
|doc count|field cardinality|query point|baseline QPS|candidate QPS|diff
percentage|
|100000000|32|1|51.44|148.26|188.22%|
|100000000|32|2|26.8|101.88|280.15%|
|100000000|32|4|14.04|53.52|281.20%|
|100000000|32|8|7.04|28.54|305.40%|
|100000000|32|16|3.54|14.61|312.71%|
|100000000|128|1|110.56|350.26|216.81%|
|100000000|128|8|16.6|89.81|441.02%|
|100000000|128|16|8.45|48.07|468.88%|
|100000000|128|32|4.2|25.35|503.57%|
|100000000|128|64|2.13|13.02|511.27%|
|100000000|1024|1|536.19|843.88|57.38%|
|100000000|1024|8|109.71|251.89|129.60%|
|100000000|1024|32|33.24|104.11|213.21%|
|100000000|1024|128|8.87|30.47|243.52%|
|100000000|1024|512|2.24|8.3|270.54%|
|100000000|8192|1|3333.33|5000|50.00%|
|100000000|8192|32|139.47|214.59|53.86%|
|100000000|8192|128|54.59|109.23|100.09%|
|100000000|8192|512|15.61|36.15|131.58%|
|100000000|8192|2048|4.11|11.14|171.05%|
|100000000|1048576|1|2597.4|3030.3|16.67%|
|100000000|1048576|32|314.96|371.75|18.03%|
|100000000|1048576|128|99.7|116.28|16.63%|
|100000000|1048576|512|30.5|37.15|21.80%|
|100000000|1048576|2048|10.38|12.3|18.50%|
|100000000|8388608|1|2564.1|3174.6|23.81%|
|100000000|8388608|32|196.27|238.95|21.75%|
|100000000|8388608|128|55.36|68.03|22.89%|
|100000000|8388608|512|15.58|19.24|23.49%|
|100000000|8388608|2048|4.56|5.71|25.22%|
The indices size is reduced for low cardinality fields and flat for high
cardinality fields.
{code:java}
113M index_100000000_doc_32_cardinality_baseline
114M index_100000000_doc_32_cardinality_candidate
140M index_100000000_doc_128_cardinality_baseline
133M index_100000000_doc_128_cardinality_candidate
193M index_100000000_doc_1024_cardinality_baseline
174M index_100000000_doc_1024_cardinality_candidate
241M index_100000000_doc_8192_cardinality_baseline
233M index_100000000_doc_8192_cardinality_candidate
314M index_100000000_doc_1048576_cardinality_baseline
315M index_100000000_doc_1048576_cardinality_candidate
392M index_100000000_doc_8388608_cardinality_baseline
391M index_100000000_doc_8388608_cardinality_candidate
{code}
was:
This issue tried to use a 512 ints {{ForUtil}} for BKD ids codec. I benchmarked
this optimization by mocking some random LongPoint and querying them with
PointInSetQuery.
*Benchmark Result*
|doc count|field cardinality|query point|baseline QPS|candidate QPS|diff
percentage|
|100000000|32|1|51.44|148.26|188.22%|
|100000000|32|2|26.8|101.88|280.15%|
|100000000|32|4|14.04|53.52|281.20%|
|100000000|32|8|7.04|28.54|305.40%|
|100000000|32|16|3.54|14.61|312.71%|
|100000000|128|1|110.56|350.26|216.81%|
|100000000|128|8|16.6|89.81|441.02%|
|100000000|128|16|8.45|48.07|468.88%|
|100000000|128|32|4.2|25.35|503.57%|
|100000000|128|64|2.13|13.02|511.27%|
|100000000|1024|1|536.19|843.88|57.38%|
|100000000|1024|8|109.71|251.89|129.60%|
|100000000|1024|32|33.24|104.11|213.21%|
|100000000|1024|128|8.87|30.47|243.52%|
|100000000|1024|512|2.24|8.3|270.54%|
|100000000|8192|1|3333.33|5000|50.00%|
|100000000|8192|32|139.47|214.59|53.86%|
|100000000|8192|128|54.59|109.23|100.09%|
|100000000|8192|512|15.61|36.15|131.58%|
|100000000|8192|2048|4.11|11.14|171.05%|
|100000000|1048576|1|2597.4|3030.3|16.67%|
|100000000|1048576|32|314.96|371.75|18.03%|
|100000000|1048576|128|99.7|116.28|16.63%|
|100000000|1048576|512|30.5|37.15|21.80%|
|100000000|1048576|2048|10.38|12.3|18.50%|
|100000000|8388608|1|2564.1|3174.6|23.81%|
|100000000|8388608|32|196.27|238.95|21.75%|
|100000000|8388608|128|55.36|68.03|22.89%|
|100000000|8388608|512|15.58|19.24|23.49%|
|100000000|8388608|2048|4.56|5.71|25.22%|
The indices size is reduced for low cardinality fields and flat for high
cardinality fields.
{code:java}
113M index_100000000_doc_32_cardinality_baseline
114M index_100000000_doc_32_cardinality_candidate
140M index_100000000_doc_128_cardinality_baseline
133M index_100000000_doc_128_cardinality_candidate
241M index_100000000_doc_8192_cardinality_baseline
233M index_100000000_doc_8192_cardinality_candidate
193M index_100000000_doc_1024_cardinality_baseline
174M index_100000000_doc_1024_cardinality_candidate
314M index_100000000_doc_1048576_cardinality_baseline
315M index_100000000_doc_1048576_cardinality_candidate
392M index_100000000_doc_8388608_cardinality_baseline
391M index_100000000_doc_8388608_cardinality_candidate
{code}
> Speed up BKD leaf block ids codec by a 512 ints ForUtil
> -------------------------------------------------------
>
> Key: LUCENE-10315
> URL: https://issues.apache.org/jira/browse/LUCENE-10315
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Feng Guo
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> This issue tried to use a 512 ints {{ForUtil}} for BKD ids codec. I
> benchmarked this optimization by mocking some random LongPoint and querying
> them with PointInSetQuery.
> *Benchmark Result*
> |doc count|field cardinality|query point|baseline QPS|candidate QPS|diff
> percentage|
> |100000000|32|1|51.44|148.26|188.22%|
> |100000000|32|2|26.8|101.88|280.15%|
> |100000000|32|4|14.04|53.52|281.20%|
> |100000000|32|8|7.04|28.54|305.40%|
> |100000000|32|16|3.54|14.61|312.71%|
> |100000000|128|1|110.56|350.26|216.81%|
> |100000000|128|8|16.6|89.81|441.02%|
> |100000000|128|16|8.45|48.07|468.88%|
> |100000000|128|32|4.2|25.35|503.57%|
> |100000000|128|64|2.13|13.02|511.27%|
> |100000000|1024|1|536.19|843.88|57.38%|
> |100000000|1024|8|109.71|251.89|129.60%|
> |100000000|1024|32|33.24|104.11|213.21%|
> |100000000|1024|128|8.87|30.47|243.52%|
> |100000000|1024|512|2.24|8.3|270.54%|
> |100000000|8192|1|3333.33|5000|50.00%|
> |100000000|8192|32|139.47|214.59|53.86%|
> |100000000|8192|128|54.59|109.23|100.09%|
> |100000000|8192|512|15.61|36.15|131.58%|
> |100000000|8192|2048|4.11|11.14|171.05%|
> |100000000|1048576|1|2597.4|3030.3|16.67%|
> |100000000|1048576|32|314.96|371.75|18.03%|
> |100000000|1048576|128|99.7|116.28|16.63%|
> |100000000|1048576|512|30.5|37.15|21.80%|
> |100000000|1048576|2048|10.38|12.3|18.50%|
> |100000000|8388608|1|2564.1|3174.6|23.81%|
> |100000000|8388608|32|196.27|238.95|21.75%|
> |100000000|8388608|128|55.36|68.03|22.89%|
> |100000000|8388608|512|15.58|19.24|23.49%|
> |100000000|8388608|2048|4.56|5.71|25.22%|
> The indices size is reduced for low cardinality fields and flat for high
> cardinality fields.
> {code:java}
> 113M index_100000000_doc_32_cardinality_baseline
> 114M index_100000000_doc_32_cardinality_candidate
> 140M index_100000000_doc_128_cardinality_baseline
> 133M index_100000000_doc_128_cardinality_candidate
> 193M index_100000000_doc_1024_cardinality_baseline
> 174M index_100000000_doc_1024_cardinality_candidate
> 241M index_100000000_doc_8192_cardinality_baseline
> 233M index_100000000_doc_8192_cardinality_candidate
> 314M index_100000000_doc_1048576_cardinality_baseline
> 315M index_100000000_doc_1048576_cardinality_candidate
> 392M index_100000000_doc_8388608_cardinality_baseline
> 391M index_100000000_doc_8388608_cardinality_candidate
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]