goldmedal commented on issue #11546: URL: https://github.com/apache/datafusion/issues/11546#issuecomment-2241195188
Ok, I think it's getting worse. ``` Gnuplot not found, using plotters backend map_1000 time: [9.1289 ms 9.1897 ms 9.2537 ms] change: [-0.7915% +0.1565% +1.1967%] (p = 0.75 > 0.05) No change in performance detected. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe Benchmarking map_one_1000: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.4s, or reduce sample count to 70. map_one_1000 time: [62.787 ms 63.239 ms 63.732 ms] change: [-2.6609% -1.0620% +0.3899%] (p = 0.17 > 0.05) No change in performance detected. Found 5 outliers among 100 measurements (5.00%) ``` I also tried to remove ``` let mut args = keys; args.extend(values); ``` Just pass an args vector to `map_from_array`, but it's still slower. I pushed this version to a different branch: https://github.com/goldmedal/datafusion/blob/feature/11546-map-df-api-v4/datafusion/functions/src/core/map.rs If you're interested, you can check it out. Actually, I found that `make_scalar_function` uses `ColumnarValue::values_to_arrays`, so I need to use `make_array_inner` to aggregate the primitive arrays. In conclusion, the original design (using `make_array`) is the fastest. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org