HaoYang670 opened a new issue #1373:
URL: https://github.com/apache/arrow-rs/issues/1373


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   The function `min_max_string` does some unnecessary checks when `null_count 
> 0 `. For example, we don't need to check `has_value` in every loop because it 
will always be `true` after the first loop. 
   ```rust
       if null_count == 0 {
           n = array.value(0);
           for i in 1..data.len() {
               let item = array.value(i);
               if cmp(n, item) {
                   n = item;
               }
           }
       } else {
           n = "";
           let mut has_value = false;
   
           for i in 0..data.len() {
               let item = array.value(i);
               if data.is_valid(i) && (!has_value || cmp(n, item)) {
                   has_value = true;
                   n = item;
               }
           }
       }
   ```
   
https://github.com/apache/arrow-rs/blob/master/arrow/src/compute/kernels/aggregate.rs#L55-L64
   
   Apart from that, I want this function to be cleaned up because the "for 
loops" here are not pretty.
   
   **Describe the solution you'd like**
   1. Performance should be improved when `null_count > 0`
   2. No performance penalty is introduced in other cases
   3. clean up the code. Maybe use some FP skills
   
   **Describe alternatives you've considered**
   We can also replace `array.value(i)` by `array.value_unchecked(i)`. But it 
will introduce some "unsafe", so I am not sure. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to