parthchandra commented on PR #1034: URL: https://github.com/apache/datafusion-comet/pull/1034#issuecomment-2435919148
Initial performance numbers for this implementation are not looking good. There are two areas where things are getting slower compared to Spark 1 . No WholestageCodegen - The iteration over rows alone is adding extra cost to the implementation 2. We incur an additional cost of creating UnsafeRows in native which is more costly than the calls made by Spark to extract values out of Arrow vector. Here's the initial benchmark run for just integer types - ``` Running benchmark: ColumnarToRowExec Running case: Spark Columnar To Row - integer Stopped after 34 iterations, 2029 ms Running case: Comet Columnar To Row - integer Stopped after 24 iterations, 2081 ms OpenJDK 64-Bit Server VM 11.0.19+7-LTS on Mac OS X 14.6 Apple M3 Max ColumnarToRowExec: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ Spark Columnar To Row - integer 40 60 14 262.1 3.8 1.0X Comet Columnar To Row - integer 53 87 32 198.2 5.0 0.8X ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org