phaniarnab commented on PR #2050:
URL: https://github.com/apache/systemds/pull/2050#issuecomment-2227938544

   > > > > Thanks, @WDRshadow, for initiating the project. As discussed before, 
please experiment with realistic use cases such as parallel scoring and 
training. You can use our DNN built-ins.
   > > > 
   > > > 
   > > > Thank you for your comment. My partner @KexingLi22 is writing the test 
classes. We will see it soon. For DNN testing, we were faced with the awkward 
situation of not having enough suitable GPUs for testing. As I mentioned above, 
newer graphics cards can not run on Jcuda `10.2.0`. To be precise, `CUDA 10.2` 
is not supported by RTX30 series, A100 and newer graphics cards. My test 
environment lacks older graphics cards. Could you please help us to test in a 
multi-GPU environment with suitable GPUs after we have written the test 
classes? Or again, could you provide any testing environment for us?
   > > 
   > > 
   > > Thanks for clarifying. Unfortunately, at this point, we cannot provide a 
setup. Once you are done with the project, I can run some performance tests 
along with our performance test suits. But during the development period, it is 
not feasible to try every change in our shared node. Without a proper setup of 
two GPUs, it will be very hard to complete this project. I can offer two 
possible directions from here:
   > > 
   > > 1. Try running SystemDS on this setup with one GPU at CUDA 10.2 and the 
other at 11. CUDA 11 has some API differences and may not be able to execute 
all CUDA methods, but you may still have a functioning system. However, I never 
tried this myself and unsure about the behavior.
   > > 2. Instead of multi-GPU, first implement a multi-stream single-GPU 
parfor. You need a single GPU with CUDA 10.2. You can use the Jcuda API to 
create multiple GPU streams, and assign a stream to each parfor thread. This is 
probably a better alternative.
   > 
   > We got a double RTX2080Ti server and tested scripts in 
`scripts/nn/example`. Except `AttentionExample` can't recognize the operator 
`_map` and `Example-MNIST_2NN_Leaky_ReLu_Softmax` can't find the source file 
`mnist_2NN.dml`, the others can run good. But I know none of them are optimized 
for multiple GPUs. The only function that is currently optimized for multiple 
GPUs is `parfor`. We will keep testing the scripts in 
`src/test/java/org/apache/sysds/test/functions/parfor` and write new test 
scripts for multi-GPUs cases.
   
   Thanks. You do not have to optimize all NN workloads for multi-GPU. Just 
implementing a robust parfor support is sufficient for this project. 
   Please write scoring scenario using parfor. Create a random matrix of test 
images and take one of the model. For each row, call the forward path from 
within a parfor, allowing parallel scoring. Store the inferred class in a 
separate vector. I hope to see some performance improvement of utilizing 
multiple GPUs.
   The parfor tests are not ideal for this project as the operations in those 
scripts were not targeted for GPUs. You may not see any speedups. However, you 
can use those tests for unit testing.
   Did you verify that you are actually using both the GPUs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to