FelixYBW commented on code in PR #7642:
URL: https://github.com/apache/incubator-gluten/pull/7642#discussion_r1841420391


##########
tools/workload/benchmark_velox/README.md:
##########
@@ -0,0 +1,38 @@
+# Setup, Build and Benchmark Spark/Gluten with Jupyter Notebook
+
+This guide provides notebooks and scripts for conducting performance testing 
in Gluten. The standard approach involves setting up the test environment on a 
bare-metal machine or cloud instance and running performance tests with 
TPC-H/TPC-DS workloads. These scripts enable users to reproduce our performance 
results in their own environment.
+
+## Environment Setup
+
+The recommended OS is ubuntu22.04 with kernel 5.15. To prepare the 
environment, run [initialize.ipynb](./initialize.ipynb), which will:
+
+- Install system dependencies and set up jupyter notebook
+- Configure Hadoop and Spark
+- Configure kernel parameters
+- Build Gluten using Docker
+- Generate TPC-H/TPC-DS tables
+
+## Running TPC-H/TPC-DS Benchmarks
+
+To run TPC-H/TPC-DS benchmarks, use 
[tpc_workload.ipynb](./tpc_workload.ipynb). You can create a copy of the 
notebook and modify the parameters defined in this notebook to run different 
workloads. However, creating and modifying a copy each time you change 
workloads can be inconvenient. Instead, it's recommended to use Papermill to 
pass parameters via the command line for greater flexibility.
+
+The required parameters are specified in 
[params.yaml.template](./params.yaml.template). To use it, create your own YAML 
file by copying and modifying the template. The command to run the notebook is:
+
+```bash
+papermill tpc_workload.ipynb --inject-output-path -f params.yaml 
gluten_tpch.ipynb
+```
+After execution, the output notebook will be saved as `gluten_tpch.ipynb`.
+
+If you want to use different parameters, you can specify them via the `-f` 
option. It will overwrite the previously defined parameters in `params.yaml`. 
e.g. To switch to the TPC-DS workload, run:

Review Comment:
   specify them via the `-p`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to