[ 
https://issues.apache.org/jira/browse/ARROW-11630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

benwang li updated ARROW-11630:
-------------------------------
    Description: 
 

1.  Use partial_sort for queries with limit expression. 
{code:java}
//代码占位符
select number from table order by numbder limit 3;{code}
We can use partial_sort (can be implemented in BinaryHeap). This can 
significantly improve the sorting performance in sort && limit queries
 

Refer: 
[https://github.com/ClickHouse/ClickHouse/blob/f669a9f97ad850edb77d10e51cd0c41a4af737bf/src/Columns/ColumnVector.cpp#L137-L145]
 [ 
|https://github.com/ClickHouse/ClickHouse/blob/f669a9f97ad850edb77d10e51cd0c41a4af737bf/src/Columns/ColumnVector.cpp#L137-L145]





2. Use pqdsort for Primitive arrays. this is already done in ClickHouse, Refer: 
[https://github.com/ClickHouse/ClickHouse/blob/f669a9f97ad850edb77d10e51cd0c41a4af737bf/src/Columns/ColumnVector.cpp#L188-L191]

 

 

  was:
This is already done in ClickHouse, 
[https://github.com/ClickHouse/ClickHouse/blob/f669a9f97ad850edb77d10e51cd0c41a4af737bf/src/Columns/ColumnVector.cpp#L188-L191]

In my server benchmark, it proves out to be better performance to use pdqsort 
for value_indices.

 
{code:java}
//代码占位符
  
cargo bench --bench  sort_kernel

Gnuplot not found, using plotters backend
sort 2^10               time:   [226.90 us 227.32 us 227.78 us]
                        change: [-12.867% -12.670% -12.463%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mildBenchmarking sort 2^12: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 5.7s, enable flat sampling, or reduce sample count to 60.
sort 2^12               time:   [1.1264 ms 1.1280 ms 1.1298 ms]
                        change: [-13.544% -13.131% -12.651%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severesort nulls 2^10         time:   [172.19 us 178.21 us 
184.20 us]                            ^[[A^[[A^[[A^[[A^[[B^B[
                        change: [-24.136% -21.848% -19.509%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
  
sort nulls 2^12         time:   [760.54 us 762.53 us 764.95 us]
                        change: [-29.358% -28.483% -27.749%] (p = 0.00 < 0.05)
                        Performance has improved.
  

Gnuplot not found, using plotters backend
sort 2^10               time:   [226.90 us 227.32 us 227.78 us]
                        change: [-12.867% -12.670% -12.463%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mildBenchmarking sort 2^12: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 5.7s, enable flat sampling, or reduce sample count to 60.
sort 2^12               time:   [1.1264 ms 1.1280 ms 1.1298 ms]
                        change: [-13.544% -13.131% -12.651%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severesort nulls 2^10         time:   [172.19 us 178.21 us 
184.20 us]
                        change: [-24.136% -21.848% -19.509%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
  
sort nulls 2^12         time:   [760.54 us 762.53 us 764.95 us]
                        change: [-29.358% -28.483% -27.749%] (p = 0.00 < 0.05)
                        Performance has improved.{code}


> [rust] sort performance
> -----------------------
>
>                 Key: ARROW-11630
>                 URL: https://issues.apache.org/jira/browse/ARROW-11630
>             Project: Apache Arrow
>          Issue Type: Bug
>            Reporter: benwang li
>            Priority: Major
>
>  
> 1.  Use partial_sort for queries with limit expression. 
> {code:java}
> //代码占位符
> select number from table order by numbder limit 3;{code}
> We can use partial_sort (can be implemented in BinaryHeap). This can 
> significantly improve the sorting performance in sort && limit queries
>  
> Refer: 
> [https://github.com/ClickHouse/ClickHouse/blob/f669a9f97ad850edb77d10e51cd0c41a4af737bf/src/Columns/ColumnVector.cpp#L137-L145]
>  [ 
> |https://github.com/ClickHouse/ClickHouse/blob/f669a9f97ad850edb77d10e51cd0c41a4af737bf/src/Columns/ColumnVector.cpp#L137-L145]
> 2. Use pqdsort for Primitive arrays. this is already done in ClickHouse, 
> Refer: 
> [https://github.com/ClickHouse/ClickHouse/blob/f669a9f97ad850edb77d10e51cd0c41a4af737bf/src/Columns/ColumnVector.cpp#L188-L191]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to