KexingLi22 commented on PR #2050:
URL: https://github.com/apache/systemds/pull/2050#issuecomment-2231948344

   > > > > > Thanks, @WDRshadow, for initiating the project. As discussed 
before, please experiment with realistic use cases such as parallel scoring and 
training. You can use our DNN built-ins.
   > > > > 
   > > > > 
   > > > > Thank you for your comment. My partner @KexingLi22 is writing the 
test classes. We will see it soon. For DNN testing, we were faced with the 
awkward situation of not having enough suitable GPUs for testing. As I 
mentioned above, newer graphics cards can not run on Jcuda `10.2.0`. To be 
precise, `CUDA 10.2` is not supported by RTX30 series, A100 and newer graphics 
cards. My test environment lacks older graphics cards. Could you please help us 
to test in a multi-GPU environment with suitable GPUs after we have written the 
test classes? Or again, could you provide any testing environment for us?
   > > > 
   > > > 
   > > > Thanks for clarifying. Unfortunately, at this point, we cannot provide 
a setup. Once you are done with the project, I can run some performance tests 
along with our performance test suits. But during the development period, it is 
not feasible to try every change in our shared node. Without a proper setup of 
two GPUs, it will be very hard to complete this project. I can offer two 
possible directions from here:
   > > > 
   > > > 1. Try running SystemDS on this setup with one GPU at CUDA 10.2 and 
the other at 11. CUDA 11 has some API differences and may not be able to 
execute all CUDA methods, but you may still have a functioning system. However, 
I never tried this myself and unsure about the behavior.
   > > > 2. Instead of multi-GPU, first implement a multi-stream single-GPU 
parfor. You need a single GPU with CUDA 10.2. You can use the Jcuda API to 
create multiple GPU streams, and assign a stream to each parfor thread. This is 
probably a better alternative.
   > > 
   > > 
   > > We got a double RTX2080Ti server and tested scripts in 
`scripts/nn/example`. Except `AttentionExample` can't recognize the operator 
`_map` and `Example-MNIST_2NN_Leaky_ReLu_Softmax` can't find the source file 
`mnist_2NN.dml`, the others can run good. But I know none of them are optimized 
for multiple GPUs. The only function that is currently optimized for multiple 
GPUs is `parfor`. We will keep testing the scripts in 
`src/test/java/org/apache/sysds/test/functions/parfor` and write new test 
scripts for multi-GPUs cases.
   > 
   > Thanks. You do not have to optimize all NN workloads for multi-GPU. Just 
implementing a robust parfor support is sufficient for this project. Please 
write scoring scenario using parfor. Create a random matrix of test images and 
take one of the model. For each row, call the forward path from within a 
parfor, allowing parallel scoring. Store the inferred class in a separate 
vector. I hope to see some performance improvement of utilizing multiple GPUs. 
The parfor tests are not ideal for this project as the operations in those 
scripts were not targeted for GPUs. You may not see any speedups. However, you 
can use those tests for unit testing. Did you verify that you are actually 
using both the GPUs?
   
   
   THanks for your suggestion, @phaniarnab . 
   
   We have written a test class MultiGPUTest.java with single GPU test case, 
MultipleGPU test case to 
    run the script, in which the model EfficientNet was trained and predicts 
using parfor. 
    
    Everything works well and the execute time of singleGPU is 35 sec 121ms, of 
the multiGPU is 27 sec 378 ms.
    
    And as the advice from @WDRshadow, I also try to add the logger instance 
into both ParforBody and GPUContext to trace the thread and the GPUContext. And 
I have already add these into the log4j.properties:
    # Enable detailed logging for specific classes
   log4j.logger.org.apache.sysds.runtime.controlprogram.parfor.ParForBody=DEBUG
   
log4j.logger.org.apache.sysds.runtime.instructions.gpu.context.GPUContext=DEBUG
   
   But when I run the test,dml script with the parfor function, nothing about 
this, which I expected shows out :
   24/07/16 10:00:00 DEBUG ParForBody - Thread Thread-1 assigned to GPU context 0
   
   How can I solve this problems?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to