WDRshadow commented on PR #2050: URL: https://github.com/apache/systemds/pull/2050#issuecomment-2241790721
@phaniarnab We have change our code and test again. The time now is only including `parfor` execution time. The `parfor` is run 3 times and we used the mean time. Here is the record: | test_id | num_interation | 1_gpu_time_sec | 2_gpu_time_sec | boost_rate | |:---------------------|-----------------:|----------------------:|----------------------:|----------------------:| | test01_gpuTest_10k | 10000 | 2.0 | 1.7 | 15% | | test01_gpuTest_20k | 20000 | 4.0 | 3.0 | 25.0% | | test01_gpuTest_50k | 50000 | 11.0 | 7.3 | 33.7% | | test01_gpuTest_100k | 100000 | 22.3 | 15.0 | 32.7% | | test01_gpuTest_200k | 200000 | 46.0 | 31.3 | 31.9% | | test01_gpuTest_500k | 500000 | 109.3 | 79.3 | 27.5% | | Total | 880000 | 194.6 | 137.6 | 29.3% | ### Test environment: - CPU: `24 vCPU Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50GHz` - GPU: `RTX2080Ti` * 2 - RAM: `80G` - OS: `Ubuntu 18.04` - CUDA: `10.2` ### Comments: From the table it can be seen that the `boost_rate` does not reach the desired `50%`. This should be due to under-optimisation of `LocalParWorker` or GPU memory management. We have observed the following reasons that may affect multi-GPU optimisation: 1. Multiple GPUs share a storage space with synchronisation locks. For example, `_gpuObjects` stores the caches in each `Task` for the GPUs to read and record the data. Each time a GPU reads that data it will cause blocking. 2. The `TaskPartitioner` design may not be optimal. When the number of `Task` allocations is low in the case of a large amount of data but a small number of `threads`, the single Task calculation will be larger. However, there may be errors in GPU computation, in which case the `Task` needs to be recomputed, consuming more time if that error `Task` is "big". This can be mitigated by improving `Task` allocation. 3. The speedup will improve greatly when multiple GPUs are equally divided into the `Task` and there are no errors in the computation. However, I have observed that in the case of dual graphics cards, one card may execute more `Tasks` than the other. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org