mrpowers-wb commented on issue #16710:
URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3078052376

   @zhuqi-lucas - these benchmarks use Parquet files, see the querybench repo 
for the code: https://github.com/MrPowers/querybench.  I think Parquet is a lot 
better for these benchmarks.
   
   The data generation scripts are in falsa if you'd like to generate the files 
locally: https://github.com/mrpowers-io/falsa/ (thanks @SemyonSinchenko!)
   
   DuckDB isn't included because it can't handle the joins on my machine with 
the 1e8 datasets.  I guess it runs out of memory.  It can handle the 1e7 
datasets fine.
   
   There are 5 h2o join queries and q5 is omitted (the join between two large 
tables) because no engine can handle joining the 1e8 table with another 1e8 
table on my machine with 16GB of RAM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to