Lordworms opened a new issue, #9547: URL: https://github.com/apache/arrow-datafusion/issues/9547
### Is your feature request related to a problem or challenge? I was doing a course project on efficiency comparison. And I try on using VTune on TPC-H benchmark to compare the efficiency between datafusion and duckDB. The results indicated that There might be some efficiency issues. I also noticed that the effective CPU use time of datafusion is much higher than DuckDB, but the runtime on TPC-H is slower(seems like we did not really do parallism and I really think that's some problem comes from Tokio) This is DuckDB's result  This is Datafusion's result  Also the flame graph shows that datafusion has a much deeper stack. duckDB  datafusion  I kind of generated some distrust towards Tokio. I doubt whether the slower performance is due to incomplete use of SIMD instruction so I did some statistics on SIMD instructions using PIN(may be the result is not that precise, but I expected the number of SIMD instruction generated should be comparable), the results shows below | SIMD instruction | datafusion number | duckDB number | | ---------------- | ----------------: | ------------: | | ADDSD | 34| 25| | CMPSD_XMM | 1| 6| | COMISD | -| 44| | DIVSD | 14| 32| | MAXSD | 1| 1| | MULSD | 21| 52| | PACKUSWB | 5| 7| | PADDB | 30| 12| | PADDD | 100| 33| | PADDQ | 291| 200| | PADDW | 8| 5| | PCMPEQB | 548| 544| | PCMPEQD | 58| 38| | PCMPGTB | -| 1| | PCMPGTD | 44| 14| | PCMPGTW | -| 6| | PMINUB | 8| 20| | PMOVMSKB | 1169| 278| | PMULHUW | 1| 2| | PMULLW | 1| 2| | PMULUDQ | -| 4| | PSHUFD | 646| 88| | PSLLD | 6| 2| | PSLLDQ | 72| 217| | PSLLQ | 213| 16| | PSLLW | 30| 2| | PSRAD | 8| -| | PSRLD | 3| 40| | PSRLDQ | 39| 179| | PSRLQ | 11| 7| | PSUBB | 84| 243| | PSUBD | 4| 3| | PSUBQ | 12| 4| | PSUBUSB | -| 6| | PSUBW | -| 6| | PUNPCKHBW | 41| 7| | PUNPCKHDQ | 45| 66| | PUNPCKHQDQ | 102| 14| | PUNPCKHWD | 42| 50| | PUNPCKLBW | 211| 19| | PUNPCKLDQ | 94| 338| | PUNPCKLQDQ | 353| 2713| | PUNPCKLWD | 73| 80| | ROUNDSD | 1| -| | SHUFPD | 4| 20| | SHUFPS | -| 28| | SQRTSD | -| 2| | SUBSD | 10| 19| | UCOMISD | 16| 39| | VPCMPB | 56| 86| | VPCMPUB | 206| 19| | VPMINUB | 2| 15| | **Total** | 4851| 5293| Turns out that datafusion may use less SIMD instructions than DuckDB (that might be the rustc problem) ### Describe the solution you'd like I plan to do this week after next after. But got no clues yet ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
