yahoNanJing commented on pull request #1983: URL: https://github.com/apache/arrow-datafusion/pull/1983#issuecomment-1065411854
With this PR, the performance of the load testing is improved a lot. ### Cluster: One scheduler + one executor of 4 task slots. ### Benchmark of Load Testing: Run the following command with 200 requests and 50 concurrency on 1g tpch testing data: `./tpch loadtest ballista-load --requests 200 --concurrency 50 --host localhost --port 50050 --sql-path queries/ --format parquet --data-path data/tpch-1g-oneFile --query-list 1` |TaskSchedulingPolicy|Pull|Push| | --- | --- | --- | |`Before PR`| 1. load test took 419353.8 ms <br> 2. load test took 1008875.6 ms <br> 3. load test took 1543815.8 ms| |`With PR`| 1. load test took 169643.3 ms <br> 2. load test took 178026.6 ms <br> 3. load test took 189341.6 ms| 1. load test took 235245.9 ms <br> 2. load test took 208277.6 ms <br> 3. load test took 220425.3 ms| It seems there's some bug for the master branch when doing the load testing for the push-based task scheduling. Will check it later. ### Single Query Performance As the comparison standard, we need to know the single query performance. Run the following command with 1 requests and 1 concurrency on 1g tpch testing data: `./tpch loadtest ballista-load --requests 1 --concurrency 1 --host localhost --port 50050 --sql-path queries/ --format parquet --data-path data/tpch-1g-oneFile --query-list 1` |TaskSchedulingPolicy|Pull|Push| | --- | --- | --- | |`Before PR`| 1. load test took 1765.2 ms <br> 2. load test took 1535.3 ms <br> 3. load test took 1534.2 ms | 1. load test took 617.6 ms <br> 2. load test took 615.6 ms <br> 3. load test took 616.6 ms |`With PR`| 1. load test took 1631.0 ms <br> 2. load test took 1327.6 ms <br> 3. load test took 1327.8 ms | 1. load test took 616.3 ms <br> 2. load test took 616.3 ms <br> 3. load test took 614.7 ms | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org