alamb opened a new pull request, #19035:
URL: https://github.com/apache/datafusion/pull/19035

   ## Which issue does this PR close?
   
   - builds on https://github.com/apache/datafusion/pull/19033
   - builds on https://github.com/apache/datafusion/pull/19034
   
   ## Rationale for this change
   
   tpchgen-cli is 10x faster than dbgen for generating tpch data (see blog here)
   
   Thus let's use that to generate tpch data for our benchmarks, rather than 
ancient docker / tpchgen
   
   While I was testing this locally I also found a bunch of un
   
   
   ## What changes are included in this PR?
   
   1. Use tpchgen-cli to generate tpch data for our benchmarks
   3. Remove the "convert" code from the binary
   4. Update the readme to explain how to use tpchgen-cli to generate data
   
   
   ## Are these changes tested?
   
   I tested them manually using
   ```shell
   ./benchmarks/bench.sh data tpch
   ./benchmarks/bench.sh run tpch
   
   ./benchmarks/bench.sh data tpch_mem
   ./benchmarks/bench.sh run tpch_mem
   
   ./benchmarks/bench.sh data tpch_csv
   ./benchmarks/bench.sh run tpch_csv
   ```
   
   
   ## Are there any user-facing changes?
   
   No, this is internal develpment code
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to