Hi Kelly, You may follow the steps in the benchmark GitHub workflow https://github.com/apache/spark/blob/master/.github/workflows/benchmark.yml
Regards, Manu On Mon, May 15, 2023 at 5:49 PM zhangliyun <kelly...@126.com> wrote: > hi > > i want to set up a tpcds benchmark to test some performance of some > spark feature > i saw in TPCDSQueryBenchmark , it need send the --data-location to the > class, my question is how to generate the tpcds data in this benchmark > ``` > /** > * Benchmark to measure TPCDS query performance. > * To run this: > * {{{ > * 1. without sbt: > * bin/spark-submit --jars <spark core test jar>,<spark catalyst test jar> > * --class <this class> <spark sql test jar> --data-location <location> > * 2. build/sbt "sql/test:runMain <this class> --data-location <TPCDS data > location>" > * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt > * "sql/test:runMain <this class> --data-location <location>" > * Results will be written to "benchmarks/TPCDSQueryBenchmark-results.txt". > * }}} > */ > object TPCDSQueryBenchmark extends SqlBasedBenchmark with Logging { > > ``` > > > Best Regards > > Kelly Zhang /Liyun Zhang >