Re: [I] Easier Dataframe API for `map` [datafusion]

via GitHub Sat, 20 Jul 2024 09:11:29 -0700


goldmedal commented on issue #11546:
URL: https://github.com/apache/datafusion/issues/11546#issuecomment-2241195188


   Ok, I think it's getting worse. 
   ```
   Gnuplot not found, using plotters backend
   map_1000                time:   [9.1289 ms 9.1897 ms 9.2537 ms]
                           change: [-0.7915% +0.1565% +1.1967%] (p = 0.75 > 
0.05)
                           No change in performance detected.
   Found 4 outliers among 100 measurements (4.00%)
     3 (3.00%) high mild
     1 (1.00%) high severe
   
   Benchmarking map_one_1000: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase 
target time to 6.4s, or reduce sample count to 70.
   map_one_1000            time:   [62.787 ms 63.239 ms 63.732 ms]
                           change: [-2.6609% -1.0620% +0.3899%] (p = 0.17 > 
0.05)
                           No change in performance detected.
   Found 5 outliers among 100 measurements (5.00%)
   ```
   I also tried to remove
   ```
       let mut args = keys;
       args.extend(values);
   ```
   Just pass an args vector to `map_from_array`, but it's still slower. I 
pushed this version to a different branch: 
https://github.com/goldmedal/datafusion/blob/feature/11546-map-df-api-v4/datafusion/functions/src/core/map.rs
    If you're interested, you can check it out.
   
   Actually, I found that `make_scalar_function` uses 
`ColumnarValue::values_to_arrays`, so I need to use `make_array_inner` to 
aggregate the primitive arrays.
   
   In conclusion, the original design (using `make_array`) is the fastest.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] Easier Dataframe API for `map` [datafusion]

Reply via email to