goldmedal commented on issue #11546:
URL: https://github.com/apache/datafusion/issues/11546#issuecomment-2241158991
Here is the benchmark result after removing `make_array` ( I also pushed a
new commit to the draft PR):
```
map_1000 time: [10.105 ms 10.168 ms 10.233 ms]
change: [+0.0989% +1.5780% +2.8979%] (p = 0.02 <
0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
map_one_1000 time: [44.081 ms 45.278 ms 46.808 ms]
change: [+1.8229% +4.9320% +8.3942%] (p = 0.00 <
0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) high mild
6 (6.00%) high severe
```
I think the result is really bad but I tried to understand why `make_array`
is efficient. I noticed it uses `make_scalar_function` to handle the
`ColumnarValue`. I guess it could be more efficient than
`ScalarValue::into_array`.
https://github.com/apache/datafusion/blob/5da7ab300215c44ca5dc16771091890de22af99b/datafusion/functions-array/src/make_array.rs#L102-L104
I will try to use this way to modify the two version and give another
benchmark.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]