mrpowers-wb commented on issue #16710: URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3078052376
@zhuqi-lucas - these benchmarks use Parquet files, see the querybench repo for the code: https://github.com/MrPowers/querybench. I think Parquet is a lot better for these benchmarks. The data generation scripts are in falsa if you'd like to generate the files locally: https://github.com/mrpowers-io/falsa/ (thanks @SemyonSinchenko!) DuckDB isn't included because it can't handle the joins on my machine with the 1e8 datasets. I guess it runs out of memory. It can handle the 1e7 datasets fine. There are 5 h2o join queries and q5 is omitted (the join between two large tables) because no engine can handle joining the 1e8 table with another 1e8 table on my machine with 16GB of RAM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org