chris-1187 commented on code in PR #2186:
URL: https://github.com/apache/systemds/pull/2186#discussion_r1941499096


##########
scripts/staging/cuda-counter-based-prng/readme.md:
##########
@@ -0,0 +1,410 @@
+# CUDA counter based PRNG
+
+Currently, random matrix generation is done using Java implementations. Either 
the Java Random class or the custom
+counter based Philox4x64 implementation is used. This is not efficient for 
large matrices because first, Java is slow
+and second, the matrix has to be copied from the main memory to the GPUs 
memory for performing matrix operations there.
+We propose to implement a counter-based PRNG on CUDA to generate random 
matrices directly on the GPU.
+
+To be consistent with the current counter based PRNG implementation, we will 
use the Philox4x64 algorithm.
+Unfortunately, the CUDA curand library is not open source, and we failed to 
replicate the numbers generated by the
+curand library using a Java implementation. We therefore propose to use the 
random123 library, which is an open-source
+library that implements the Philox4x64 algorithm under BSD-3 license. The 
random123 library is available
+at https://github.com/DEShawResearch/random123. It is well tested using 
statistical tests as described in the
+paper [Parallel random numbers: as easy as 1, 2, 
3](https://doi.org/10.1145/2063384.2063405).
+
+## How to implement
+
+There are two ways how to integrate cuda kernels into the SystemDS project. 
The first way is to ship a precompiled
+cuda ptx file with the SystemDS project. This has the drawback that the cuda 
ptx file has to be compiled for each
+cuda version and each gpu architecture.
+
+The second way is to compile the cuda kernels during runtime. This means, the 
cuda build tools need to be installed
+on the system where the SystemDS project is running, but the cuda ptx file can 
be compiled for the specific cuda 
+version and gpu architecture.
+
+### Precompiled cuda ptx file
+
+Example cuda kernel:
+
+```c++
+extern "C" __global__ void philox_4_64(ulong* output, uint64_t seed, uint64_t 
startingCounter, size_t numElements) {
+    // Calculate the thread's unique index
+    uint64_t idx = blockIdx.x * blockDim.x + threadIdx.x;
+
+    // Ensure the thread index is within bounds
+    if (idx * 4 < numElements) {
+        // Initialize the Philox generator with a unique counter and key
+        r123::Philox4x64 rng;
+        r123::Philox4x64::ctr_type ctr = {{startingCounter + idx, 0, 0, 0}}; 
// Counter (startingCounter + thread index)
+        r123::Philox4x64::key_type key = {{seed}};                          // 
Key (seed)
+
+        // Generate 4 random integers
+        r123::Philox4x64::ctr_type result = rng(ctr, key);
+
+        for (int i = 0; i < 4; ++i) {
+            size_t outputIdx = idx * 4 + i;
+
+            // Ensure we don't exceed the output array bounds
+            if (outputIdx < numElements) {
+                output[outputIdx] = result[i];
+            }
+        }
+    }
+}
+```
+
+To compile the cuda kernel to a ptx file, you can use the following command:
+
+```bash
+/usr/local/cuda/bin/nvcc kernel.cu -ccbin gcc-8 -lstdc++ -I 
./random123/include -o cuda_test.ptx -lm --ptx -std=c++11 
--gpu-architecture=sm_70
+```
+
+This will compile the cuda kernel to a ptx file that can be shipped with the 
SystemDS project.
+
+```ptx
+.version 6.5
+.target sm_70
+.address_size 64
+
+.visible .entry philox_4_64(
+       .param .u64 philox_4_64_param_0,
+       .param .u64 philox_4_64_param_1,
+       .param .u64 philox_4_64_param_2,
+       .param .u64 philox_4_64_param_3
+)
+{
+    ... cuda kernel code ...
+}
+
+```
+To use this ptx file in the SystemDS project, you can use the following code:
+
+```java
+import jcuda.*;

Review Comment:
   resolved.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to