phaniarnab commented on PR #2050: URL: https://github.com/apache/systemds/pull/2050#issuecomment-2227938544
> > > > Thanks, @WDRshadow, for initiating the project. As discussed before, please experiment with realistic use cases such as parallel scoring and training. You can use our DNN built-ins. > > > > > > > > > Thank you for your comment. My partner @KexingLi22 is writing the test classes. We will see it soon. For DNN testing, we were faced with the awkward situation of not having enough suitable GPUs for testing. As I mentioned above, newer graphics cards can not run on Jcuda `10.2.0`. To be precise, `CUDA 10.2` is not supported by RTX30 series, A100 and newer graphics cards. My test environment lacks older graphics cards. Could you please help us to test in a multi-GPU environment with suitable GPUs after we have written the test classes? Or again, could you provide any testing environment for us? > > > > > > Thanks for clarifying. Unfortunately, at this point, we cannot provide a setup. Once you are done with the project, I can run some performance tests along with our performance test suits. But during the development period, it is not feasible to try every change in our shared node. Without a proper setup of two GPUs, it will be very hard to complete this project. I can offer two possible directions from here: > > > > 1. Try running SystemDS on this setup with one GPU at CUDA 10.2 and the other at 11. CUDA 11 has some API differences and may not be able to execute all CUDA methods, but you may still have a functioning system. However, I never tried this myself and unsure about the behavior. > > 2. Instead of multi-GPU, first implement a multi-stream single-GPU parfor. You need a single GPU with CUDA 10.2. You can use the Jcuda API to create multiple GPU streams, and assign a stream to each parfor thread. This is probably a better alternative. > > We got a double RTX2080Ti server and tested scripts in `scripts/nn/example`. Except `AttentionExample` can't recognize the operator `_map` and `Example-MNIST_2NN_Leaky_ReLu_Softmax` can't find the source file `mnist_2NN.dml`, the others can run good. But I know none of them are optimized for multiple GPUs. The only function that is currently optimized for multiple GPUs is `parfor`. We will keep testing the scripts in `src/test/java/org/apache/sysds/test/functions/parfor` and write new test scripts for multi-GPUs cases. Thanks. You do not have to optimize all NN workloads for multi-GPU. Just implementing a robust parfor support is sufficient for this project. Please write scoring scenario using parfor. Create a random matrix of test images and take one of the model. For each row, call the forward path from within a parfor, allowing parallel scoring. Store the inferred class in a separate vector. I hope to see some performance improvement of utilizing multiple GPUs. The parfor tests are not ideal for this project as the operations in those scripts were not targeted for GPUs. You may not see any speedups. However, you can use those tests for unit testing. Did you verify that you are actually using both the GPUs? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org