Hi Kelly,

You may follow the steps in the benchmark GitHub workflow
https://github.com/apache/spark/blob/master/.github/workflows/benchmark.yml

Regards,
Manu

On Mon, May 15, 2023 at 5:49 PM zhangliyun <kelly...@126.com> wrote:

> hi
>
>  i want to set up a tpcds benchmark  to test some performance of some
> spark feature
>  i saw  in TPCDSQueryBenchmark , it need send the --data-location to the
> class,  my question is how to generate the  tpcds data in this benchmark
> ```
> /**
> * Benchmark to measure TPCDS query performance.
> * To run this:
> * {{{
> * 1. without sbt:
> * bin/spark-submit --jars <spark core test jar>,<spark catalyst test jar>
> * --class <this class> <spark sql test jar> --data-location <location>
> * 2. build/sbt "sql/test:runMain <this class> --data-location <TPCDS data
> location>"
> * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt
> * "sql/test:runMain <this class> --data-location <location>"
> * Results will be written to "benchmarks/TPCDSQueryBenchmark-results.txt".
> * }}}
> */
> object TPCDSQueryBenchmark extends SqlBasedBenchmark with Logging {
>
> ```
>
>
> Best Regards
>
> Kelly Zhang /Liyun Zhang
>

Reply via email to