neilconway opened a new pull request, #21101:
URL: https://github.com/apache/datafusion/pull/21101

   ## Which issue does this PR close?
   
   - Closes #21100.
   
   ## Rationale for this change
   
   In the current implementation, we construct a `PrimitiveArray` for each row, 
feed it to the Arrow `min` / `max` kernel, and then collect the resulting 
`ScalarValue`s in a `Vec`. We then construct a final `PrimitiveArray` for the 
result via `ScalarValue::iter_to_array` of the `Vec`.
   
   We can do better for ListArrays of primitive types. First, we can iterate 
directly over the flat values buffer of the `ListArray` for the batch and 
compute the min/max from each row's slice directly. Second, Arrow's `min` / 
`max` kernels have a reasonable amount of per-call overhead; for small arrays, 
it is more efficient to compute the min/max ourselves via direct iteration.
   
   Benchmarks (1000 row batch, arrays of int64 values, M4 Max):
   
     - no_nulls / list_size=10: 42.0 µs → 3.2 µs (13.1x faster)
     - no_nulls / list_size=100: 52.0 µs → 17.9 µs (2.9x faster)
     - no_nulls / list_size=1000: 144.3 µs → 107.5 µs (1.3x faster)
     - nulls / list_size=10: 48.8 µs → 7.6 µs (6.4x faster)
     - nulls / list_size=100: 97.2 µs → 70.8 µs (1.4x faster)
     - nulls / list_size=1000: 654.0 µs → 633.1 µs (1.03x faster)
   
   ## What changes are included in this PR?
   
   * Add benchmark for `array_max`
   * Expand SLT test coverage
   * Implement optimization
   
   ## Are these changes tested?
   
   Yes.
   
   ## Are there any user-facing changes?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to