zhuqi-lucas commented on issue #16710:
URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3074225566

   > DataFusion is underperforming the Polars streaming engine on some 
localhost join queries (1e8 rows of data on a Macbook M3 with 16GB of RAM):
   > 
   > <img alt="Image" width="640" height="480" 
src="https://private-user-images.githubusercontent.com/2722395/463411874-045061e2-4ac5-4436-8d01-009dbb69ea41.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NTI1OTQ4NjcsIm5iZiI6MTc1MjU5NDU2NywicGF0aCI6Ii8yNzIyMzk1LzQ2MzQxMTg3NC0wNDUwNjFlMi00YWM1LTQ0MzYtOGQwMS0wMDlkYmI2OWVhNDEucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDcxNSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTA3MTVUMTU0OTI3WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZDdmZjE5NWI4Y2Y2NGZmNzAzYTY1MDI1ZGZjNzcyNjVkNjJjYWZlNDZmOGRlMDc1NzZmNzdhZDdkNWI0ZTkwNSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.V_nQL3NMWu7xg6rwZlwCW-zRQrC17IbEBTuJ2n-rIFM";>
   > Here are the [join 
queries](https://github.com/apache/datafusion/blob/main/benchmarks/queries/h2o/join.sql).
   > 
   > I am guessing the join operator can be optimized, similar to how the 
filtering and aggregation operations were optimized.
   > 
   > Here is an example of how the median function was made faster: 
[#13550](https://github.com/apache/datafusion/issues/13550)
   > 
   > See this epic for more info: 
[#13548](https://github.com/apache/datafusion/issues/13548)
   
   Does this compare result based parquet or csv format? Our h2o benchmark tool 
currently is used csv format.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to