matthewmturner commented on issue #147: URL: https://github.com/apache/arrow-datafusion/issues/147#issuecomment-1047420945
Cross post from slack: I’m working on updating datafusions db-benchmark results based on datafusion v7. i just got a first cut of the results compared to what i produced a couple months ago. i was planning on finalizing the analysis before sharing but i wanted to provide a preview as i may not have time to finish for a day or two. this was produced using datafusion-python on an M1 Macbook. on December 27th we were at the below for group by: ``` 0.11225258399999993 # q1 0.695109333 # q2 2.932470125 # q3 0.07341450000000016 # q4 3.3075385419999996 # q5 2.9051008750000005 # q7 4.573697916 # q8 68.875322208 # q10 ``` based on datafusion version 7: ``` q1: 0.03743266599999995 q2: 0.4997687500000001 q3: 2.119365208 q4: 0.034825500000000176 q5: 2.144292417 q7: 2.0165450419999997 q8: 2.9783209999999993 q10: 47.229685542 ``` We’ve seen pretty good performance increases across the board based on the latest release. Compared to currently published db-benchmark that would put datafusion as the fastest / tied for faster on groupby queries Q1 and Q4. In general, we had similar results to spark. For join in december we had: ``` q1 took 261 ms q2 took 367 ms q3 took 334 ms q4 took 507 ms q5 took 1936 ms ``` and now we are at: ``` q1: 0.5796001249999999 q2: 0.4178434580000001 q3: 0.4701954159999999 q4: 0.4357888750000001 q5: 1.8161980410000003 ``` we have lost some performance on the join side, im not sure why, but compared to other engines we are still doing very well, with basically the best performance across the board. Please take these results as preliminary…im still working through things. Im going to work on adding the missing group by queries now with the latest v7 functionality. i also was thinking of contributing a script that would run the whole db-benchmark process so that anyone could use run db-benchmark as needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org