ptrendx commented on issue #15928: [RFC] A faster version of Gamma sampling on GPU. URL: https://github.com/apache/incubator-mxnet/issues/15928#issuecomment-522289104 @yzhliu No. What MXNet currently does is a scheme where, yes, each thread gets assigned statically some number of elements, but it has a while loop for each of them. The scheme I proposed has a single while loop that processes all elements assigned to a given thread. There is a big difference between these approaches, due to SIMT architecture of the GPU. Basically you can treat some number of threads (called warp, 32 threads on NVIDIA's GPU) as lanes in SIMD vector instruction on the CPU. This means that if 1 thread needs to perform some computation, all threads in the warp need to perform the same instruction (and possibly discard the result). So in the current MXNet's implementation for each output element every group of 32 threads is always doing the number of loop iterations equal to the slowest thread (because no thread in warp can exit the while loop while at least 1 thread is still not finished). In the proposed implementation there is only 1 while loop and the only difference between threads lies inside the `if (accepted)` part, which is cheap compared to generating a random number. In this implementation every warp does the number of loop iterations equal to sum of the steps for the slowest thread (which is hopefully pretty uniform across threads, especially as we are talking RNG and not some crafted input, and definitely much better than the previous "for each element take the slowest and sum that"). @xidulu What is the RNG used for host-side and device-side API? cuRAND ones should not really differ much in perf between device-side and host-side. There are a few advantages: - you don't need to store and load the RNG numbers you made (and in the fully optimized case making random numbers should actually be pretty bandwidth-limited operation) - you don't need additional storage (besides the RNG generator state which you need anyway) - you compute only as many RNG numbers as you really need
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services