[
https://issues.apache.org/jira/browse/ARROW-11630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
benwang li updated ARROW-11630:
-------------------------------
Description:
1. Use partial_sort for queries with limit expression.
{code:java}
//代码占位符
select number from table order by numbder limit 3;{code}
We can use partial_sort (can be implemented in BinaryHeap). This can
significantly improve the sorting performance in sort && limit queries
Refer:
[https://github.com/ClickHouse/ClickHouse/blob/f669a9f97ad850edb77d10e51cd0c41a4af737bf/src/Columns/ColumnVector.cpp#L137-L145]
[
|https://github.com/ClickHouse/ClickHouse/blob/f669a9f97ad850edb77d10e51cd0c41a4af737bf/src/Columns/ColumnVector.cpp#L137-L145]
2. Use pqdsort for Primitive arrays. this is already done in ClickHouse, Refer:
[https://github.com/ClickHouse/ClickHouse/blob/f669a9f97ad850edb77d10e51cd0c41a4af737bf/src/Columns/ColumnVector.cpp#L188-L191]
was:
This is already done in ClickHouse,
[https://github.com/ClickHouse/ClickHouse/blob/f669a9f97ad850edb77d10e51cd0c41a4af737bf/src/Columns/ColumnVector.cpp#L188-L191]
In my server benchmark, it proves out to be better performance to use pdqsort
for value_indices.
{code:java}
//代码占位符
cargo bench --bench sort_kernel
Gnuplot not found, using plotters backend
sort 2^10 time: [226.90 us 227.32 us 227.78 us]
change: [-12.867% -12.670% -12.463%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mildBenchmarking sort 2^12: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase
target time to 5.7s, enable flat sampling, or reduce sample count to 60.
sort 2^12 time: [1.1264 ms 1.1280 ms 1.1298 ms]
change: [-13.544% -13.131% -12.651%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severesort nulls 2^10 time: [172.19 us 178.21 us
184.20 us] ^[[A^[[A^[[A^[[A^[[B^B[
change: [-24.136% -21.848% -19.509%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
sort nulls 2^12 time: [760.54 us 762.53 us 764.95 us]
change: [-29.358% -28.483% -27.749%] (p = 0.00 < 0.05)
Performance has improved.
Gnuplot not found, using plotters backend
sort 2^10 time: [226.90 us 227.32 us 227.78 us]
change: [-12.867% -12.670% -12.463%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mildBenchmarking sort 2^12: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase
target time to 5.7s, enable flat sampling, or reduce sample count to 60.
sort 2^12 time: [1.1264 ms 1.1280 ms 1.1298 ms]
change: [-13.544% -13.131% -12.651%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severesort nulls 2^10 time: [172.19 us 178.21 us
184.20 us]
change: [-24.136% -21.848% -19.509%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
sort nulls 2^12 time: [760.54 us 762.53 us 764.95 us]
change: [-29.358% -28.483% -27.749%] (p = 0.00 < 0.05)
Performance has improved.{code}
> [rust] sort performance
> -----------------------
>
> Key: ARROW-11630
> URL: https://issues.apache.org/jira/browse/ARROW-11630
> Project: Apache Arrow
> Issue Type: Bug
> Reporter: benwang li
> Priority: Major
>
>
> 1. Use partial_sort for queries with limit expression.
> {code:java}
> //代码占位符
> select number from table order by numbder limit 3;{code}
> We can use partial_sort (can be implemented in BinaryHeap). This can
> significantly improve the sorting performance in sort && limit queries
>
> Refer:
> [https://github.com/ClickHouse/ClickHouse/blob/f669a9f97ad850edb77d10e51cd0c41a4af737bf/src/Columns/ColumnVector.cpp#L137-L145]
> [
> |https://github.com/ClickHouse/ClickHouse/blob/f669a9f97ad850edb77d10e51cd0c41a4af737bf/src/Columns/ColumnVector.cpp#L137-L145]
> 2. Use pqdsort for Primitive arrays. this is already done in ClickHouse,
> Refer:
> [https://github.com/ClickHouse/ClickHouse/blob/f669a9f97ad850edb77d10e51cd0c41a4af737bf/src/Columns/ColumnVector.cpp#L188-L191]
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)