lmwnshn commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2651044600

   If you prefer Java to C, CMU-DB's BenchBase project does implement support 
for generating and loading TPC-H data in parallel: 
https://github.com/cmu-db/benchbase/tree/main/src/main/java/com/oltpbenchmark/benchmarks/tpch
 
   
   Another alternative that I explored is using DuckDB to generate the data, 
exporting that as Parquet, and then ingesting it into DataFusion (schema may 
require fixing):
   
   ```
   ./duckdb data/tpch.db -c "INSTALL tpch; LOAD tpch; CALL dbgen(sf = 1); 
EXPORT DATABASE './data/' (FORMAT PARQUET);"
   ```
   
   But personally I think native integration makes for the best user experience.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to