lmwnshn commented on issue #14608:
URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2651044600

   If you prefer Java to C, CMU-DB's BenchBase project does implement support 
for generating and loading TPC-H data in parallel: 
https://github.com/cmu-db/benchbase/tree/main/src/main/java/com/oltpbenchmark/benchmarks/tpch
 
   
   Another alternative that I explored is using DuckDB to generate the data, 
exporting that as Parquet, and then ingesting it into DataFusion (schema may 
require fixing):
   
   ```
   ./duckdb data/tpch.db -c "INSTALL tpch; LOAD tpch; CALL dbgen(sf = 1); 
EXPORT DATABASE './data/' (FORMAT PARQUET);"
   ```
   
   But personally I think native integration makes for the best user experience.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to