yordan-pavlov opened a new pull request #8660:
URL: https://github.com/apache/arrow/pull/8660


   This PR addresses the inefficient comparison to scalar values, where an 
array is built with the scalar value repeated, by 
   changing the return value of expressions from `Result<ArrayRef>` to 
`Result<ColumnarValue>`  where `ColumnarValue` is defined as:
   ```
   pub enum ColumnarValue {
       /// Array of values
       Array(ArrayRef),
       /// A single value 
       Scalar(ScalarValue)
   }
   ```
   
   This enables scalar values to be used in comparison operators directly, and 
for the simple query used in the benchmark ("select f32, f64 from t where f32 
>= 250 and f64 > 250") shows approximately 10x performance improvement:
   
   before:
   filter_scalar time: [35.733 ms 36.613 ms 37.924 ms]
   
   after:
   filter_scalar time: [3.5938 ms 3.6450 ms 3.7035 ms]
   change: [-90.048% -89.846% -89.625%] (p = 0.00 < 0.05)
   
   
   I have also added a benchmark to compare the change in performance when 
comparing two arrays (using query "select f32, f64 from t where f32 >= f64") 
and it is negligible:
   
   before:
   filter_array time: [11.601 ms 11.656 ms 11.718 ms]
   
   after:
   filter_array time: [11.854 ms 11.957 ms 12.070 ms]
   change: [+1.8032% +3.6391% +5.5671%] (p = 0.00 < 0.05)
   
   @andygrove @alamb let me know what you think
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to