[ 
https://issues.apache.org/jira/browse/LUCENE-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Guo updated LUCENE-10315:
------------------------------
    Description: 
This issue tried to use a 512 ints {{ForUtil}} for BKD ids codec. I benchmarked 
this optimization by mocking some random LongPoint and querying them with 
PointInSetQuery.


*Benchmark Result*
|doc count|field cardinality|query point|baseline QPS|candidate QPS|diff 
percentage|
|100000000|32|1|51.44|148.26|188.22%|
|100000000|32|2|26.8|101.88|280.15%|
|100000000|32|4|14.04|53.52|281.20%|
|100000000|32|8|7.04|28.54|305.40%|
|100000000|32|16|3.54|14.61|312.71%|
|100000000|128|1|110.56|350.26|216.81%|
|100000000|128|8|16.6|89.81|441.02%|
|100000000|128|16|8.45|48.07|468.88%|
|100000000|128|32|4.2|25.35|503.57%|
|100000000|128|64|2.13|13.02|511.27%|
|100000000|1024|1|536.19|843.88|57.38%|
|100000000|1024|8|109.71|251.89|129.60%|
|100000000|1024|32|33.24|104.11|213.21%|
|100000000|1024|128|8.87|30.47|243.52%|
|100000000|1024|512|2.24|8.3|270.54%|
|100000000|8192|1|3333.33|5000|50.00%|
|100000000|8192|32|139.47|214.59|53.86%|
|100000000|8192|128|54.59|109.23|100.09%|
|100000000|8192|512|15.61|36.15|131.58%|
|100000000|8192|2048|4.11|11.14|171.05%|
|100000000|1048576|1|2597.4|3030.3|16.67%|
|100000000|1048576|32|314.96|371.75|18.03%|
|100000000|1048576|128|99.7|116.28|16.63%|
|100000000|1048576|512|30.5|37.15|21.80%|
|100000000|1048576|2048|10.38|12.3|18.50%|
|100000000|8388608|1|2564.1|3174.6|23.81%|
|100000000|8388608|32|196.27|238.95|21.75%|
|100000000|8388608|128|55.36|68.03|22.89%|
|100000000|8388608|512|15.58|19.24|23.49%|
|100000000|8388608|2048|4.56|5.71|25.22%|

The indices size is reduced for low cardinality fields and flat for high 
cardinality fields.  

{code:java}
113M    index_100000000_doc_32_cardinality_baseline
114M    index_100000000_doc_32_cardinality_candidate

140M    index_100000000_doc_128_cardinality_baseline
133M    index_100000000_doc_128_cardinality_candidate

193M    index_100000000_doc_1024_cardinality_baseline
174M    index_100000000_doc_1024_cardinality_candidate

241M    index_100000000_doc_8192_cardinality_baseline
233M    index_100000000_doc_8192_cardinality_candidate

314M    index_100000000_doc_1048576_cardinality_baseline
315M    index_100000000_doc_1048576_cardinality_candidate

392M    index_100000000_doc_8388608_cardinality_baseline
391M    index_100000000_doc_8388608_cardinality_candidate
{code}


  was:
This issue tried to use a 512 ints {{ForUtil}} for BKD ids codec. I benchmarked 
this optimization by mocking some random LongPoint and querying them with 
PointInSetQuery.


*Benchmark Result*
|doc count|field cardinality|query point|baseline QPS|candidate QPS|diff 
percentage|
|100000000|32|1|51.44|148.26|188.22%|
|100000000|32|2|26.8|101.88|280.15%|
|100000000|32|4|14.04|53.52|281.20%|
|100000000|32|8|7.04|28.54|305.40%|
|100000000|32|16|3.54|14.61|312.71%|
|100000000|128|1|110.56|350.26|216.81%|
|100000000|128|8|16.6|89.81|441.02%|
|100000000|128|16|8.45|48.07|468.88%|
|100000000|128|32|4.2|25.35|503.57%|
|100000000|128|64|2.13|13.02|511.27%|
|100000000|1024|1|536.19|843.88|57.38%|
|100000000|1024|8|109.71|251.89|129.60%|
|100000000|1024|32|33.24|104.11|213.21%|
|100000000|1024|128|8.87|30.47|243.52%|
|100000000|1024|512|2.24|8.3|270.54%|
|100000000|8192|1|3333.33|5000|50.00%|
|100000000|8192|32|139.47|214.59|53.86%|
|100000000|8192|128|54.59|109.23|100.09%|
|100000000|8192|512|15.61|36.15|131.58%|
|100000000|8192|2048|4.11|11.14|171.05%|
|100000000|1048576|1|2597.4|3030.3|16.67%|
|100000000|1048576|32|314.96|371.75|18.03%|
|100000000|1048576|128|99.7|116.28|16.63%|
|100000000|1048576|512|30.5|37.15|21.80%|
|100000000|1048576|2048|10.38|12.3|18.50%|
|100000000|8388608|1|2564.1|3174.6|23.81%|
|100000000|8388608|32|196.27|238.95|21.75%|
|100000000|8388608|128|55.36|68.03|22.89%|
|100000000|8388608|512|15.58|19.24|23.49%|
|100000000|8388608|2048|4.56|5.71|25.22%|

The indices size is reduced for low cardinality fields and flat for high 
cardinality fields.  

{code:java}
113M    index_100000000_doc_32_cardinality_baseline
114M    index_100000000_doc_32_cardinality_candidate

140M    index_100000000_doc_128_cardinality_baseline
133M    index_100000000_doc_128_cardinality_candidate

241M    index_100000000_doc_8192_cardinality_baseline
233M    index_100000000_doc_8192_cardinality_candidate

193M    index_100000000_doc_1024_cardinality_baseline
174M    index_100000000_doc_1024_cardinality_candidate

314M    index_100000000_doc_1048576_cardinality_baseline
315M    index_100000000_doc_1048576_cardinality_candidate

392M    index_100000000_doc_8388608_cardinality_baseline
391M    index_100000000_doc_8388608_cardinality_candidate
{code}



> Speed up BKD leaf block ids codec by a 512 ints ForUtil
> -------------------------------------------------------
>
>                 Key: LUCENE-10315
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10315
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Feng Guo
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> This issue tried to use a 512 ints {{ForUtil}} for BKD ids codec. I 
> benchmarked this optimization by mocking some random LongPoint and querying 
> them with PointInSetQuery.
> *Benchmark Result*
> |doc count|field cardinality|query point|baseline QPS|candidate QPS|diff 
> percentage|
> |100000000|32|1|51.44|148.26|188.22%|
> |100000000|32|2|26.8|101.88|280.15%|
> |100000000|32|4|14.04|53.52|281.20%|
> |100000000|32|8|7.04|28.54|305.40%|
> |100000000|32|16|3.54|14.61|312.71%|
> |100000000|128|1|110.56|350.26|216.81%|
> |100000000|128|8|16.6|89.81|441.02%|
> |100000000|128|16|8.45|48.07|468.88%|
> |100000000|128|32|4.2|25.35|503.57%|
> |100000000|128|64|2.13|13.02|511.27%|
> |100000000|1024|1|536.19|843.88|57.38%|
> |100000000|1024|8|109.71|251.89|129.60%|
> |100000000|1024|32|33.24|104.11|213.21%|
> |100000000|1024|128|8.87|30.47|243.52%|
> |100000000|1024|512|2.24|8.3|270.54%|
> |100000000|8192|1|3333.33|5000|50.00%|
> |100000000|8192|32|139.47|214.59|53.86%|
> |100000000|8192|128|54.59|109.23|100.09%|
> |100000000|8192|512|15.61|36.15|131.58%|
> |100000000|8192|2048|4.11|11.14|171.05%|
> |100000000|1048576|1|2597.4|3030.3|16.67%|
> |100000000|1048576|32|314.96|371.75|18.03%|
> |100000000|1048576|128|99.7|116.28|16.63%|
> |100000000|1048576|512|30.5|37.15|21.80%|
> |100000000|1048576|2048|10.38|12.3|18.50%|
> |100000000|8388608|1|2564.1|3174.6|23.81%|
> |100000000|8388608|32|196.27|238.95|21.75%|
> |100000000|8388608|128|55.36|68.03|22.89%|
> |100000000|8388608|512|15.58|19.24|23.49%|
> |100000000|8388608|2048|4.56|5.71|25.22%|
> The indices size is reduced for low cardinality fields and flat for high 
> cardinality fields.  
> {code:java}
> 113M    index_100000000_doc_32_cardinality_baseline
> 114M    index_100000000_doc_32_cardinality_candidate
> 140M    index_100000000_doc_128_cardinality_baseline
> 133M    index_100000000_doc_128_cardinality_candidate
> 193M    index_100000000_doc_1024_cardinality_baseline
> 174M    index_100000000_doc_1024_cardinality_candidate
> 241M    index_100000000_doc_8192_cardinality_baseline
> 233M    index_100000000_doc_8192_cardinality_candidate
> 314M    index_100000000_doc_1048576_cardinality_baseline
> 315M    index_100000000_doc_1048576_cardinality_candidate
> 392M    index_100000000_doc_8388608_cardinality_baseline
> 391M    index_100000000_doc_8388608_cardinality_candidate
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to