ptrendx commented on issue #15928: [RFC] A faster version of Gamma sampling on 
GPU.
URL: 
https://github.com/apache/incubator-mxnet/issues/15928#issuecomment-522289104
 
 
   @yzhliu No. What MXNet currently does is a scheme where, yes, each thread 
gets assigned statically some number of elements, but it has a while loop for 
each of them. The scheme I proposed has a single while loop that processes all 
elements assigned to a given thread. There is a big difference between these 
approaches, due to SIMT architecture of the GPU. Basically you can treat some 
number of threads (called warp, 32 threads on NVIDIA's GPU) as lanes in SIMD 
vector instruction on the CPU. This means that if 1 thread needs to perform 
some computation, all threads in the warp need to perform the same instruction 
(and possibly discard the result).
   So in the current MXNet's implementation for each output element every group 
of 32 threads is always doing the number of loop iterations equal to the 
slowest thread (because no thread in warp can exit the while loop while at 
least 1 thread is still not finished).
   In the proposed implementation there is only 1 while loop and the only 
difference between threads lies inside the `if (accepted)` part, which is cheap 
compared to generating a random number. In this implementation every warp does 
the number of loop iterations equal to sum of the steps for the slowest thread 
(which is hopefully pretty uniform across threads, especially as we are talking 
RNG and not some crafted input, and definitely much better than the previous 
"for each element take the slowest and sum that").
   
   @xidulu What is the RNG used for host-side and device-side API? cuRAND ones 
should not really differ much in perf between device-side and host-side.
   There are a few advantages:
    - you don't need to store and load the RNG numbers you made (and in the 
fully optimized case making random numbers should actually be pretty 
bandwidth-limited operation)
    - you don't need additional storage (besides the RNG generator state which 
you need anyway)
    - you compute only as many RNG numbers as you really need

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to