zhuqi-lucas commented on issue #16710:
URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3078275180

   > [@zhuqi-lucas](https://github.com/zhuqi-lucas) - these benchmarks use 
Parquet files, see the querybench repo for the code: 
https://github.com/MrPowers/querybench. I think Parquet is a lot better for 
these benchmarks.
   > 
   > The data generation scripts are in falsa if you'd like to generate the 
files locally: https://github.com/mrpowers-io/falsa/ (thanks 
[@SemyonSinchenko](https://github.com/SemyonSinchenko)!)
   > 
   > DuckDB isn't included because it can't handle the joins on my machine with 
the 1e8 datasets. I guess it runs out of memory. It can handle the 1e7 datasets 
fine.
   > 
   > There are 5 h2o join queries and q5 is omitted (the join between two large 
tables) because no engine can handle joining the 1e8 table with another 1e8 
table on my machine with 16GB of RAM.
   
   Thank you @mrpowers-wb for good explain, i will submit a PR for datafusion 
h2o benchmark to support parquet format first, so we can optimize based the 
tool for this compare.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to