Repository: incubator-systemml
Updated Branches:
  refs/heads/master ed3a15882 -> 3757995b5


Upgraded to use jcuda8 (from the maven repo)

Closes #291


Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo
Commit: 
http://git-wip-us.apache.org/repos/asf/incubator-systemml/commit/3757995b
Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/tree/3757995b
Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/diff/3757995b

Branch: refs/heads/master
Commit: 3757995b50aef019b0ce22d9ae93eae42aed02b4
Parents: ed3a158
Author: Nakul Jindal <naku...@gmail.com>
Authored: Fri Mar 3 18:11:45 2017 -0800
Committer: Nakul Jindal <naku...@gmail.com>
Committed: Fri Mar 3 18:11:46 2017 -0800

----------------------------------------------------------------------
 docs/devdocs/gpu-backend.md                     |  61 +++---
 pom.xml                                         | 195 +++++++++++++++----
 .../runtime/matrix/data/LibMatrixCUDA.java      |  19 +-
 3 files changed, 195 insertions(+), 80 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/3757995b/docs/devdocs/gpu-backend.md
----------------------------------------------------------------------
diff --git a/docs/devdocs/gpu-backend.md b/docs/devdocs/gpu-backend.md
index c6f66d6..40311c7 100644
--- a/docs/devdocs/gpu-backend.md
+++ b/docs/devdocs/gpu-backend.md
@@ -19,52 +19,43 @@ limitations under the License.
 
 # Initial prototype for GPU backend
 
-A GPU backend implements two important abstract classes:
+The GPU backend implements two important abstract classes:
 1. `org.apache.sysml.runtime.controlprogram.context.GPUContext`
 2. `org.apache.sysml.runtime.controlprogram.context.GPUObject`
 
-The GPUContext is responsible for GPU memory management and 
initialization/destruction of Cuda handles.
+The `GPUContext` is responsible for GPU memory management and 
initialization/destruction of Cuda handles.
+Currently, an active instance of the `GPUContext` class is made available 
globally and is used to store handles
+of the allocated blocks on the GPU. A count is kept per block for the number 
of instructions that need it.
+When the count is 0, the block may be evicted on a call to `GPUObject.evict()`.
 
-A GPUObject (like RDDObject and BroadcastObject) is stored in CacheableData 
object. It gets call-backs from SystemML's bufferpool on following methods
+A `GPUObject` (like RDDObject and BroadcastObject) is stored in CacheableData 
object. It gets call-backs from SystemML's bufferpool on following methods
 1. void acquireDeviceRead()
-2. void acquireDenseDeviceModify(int numElemsToAllocate)
-3. void acquireHostRead()
-4. void acquireHostModify()
-5. void release(boolean isGPUCopyModified)
+2. void acquireDeviceModifyDense()
+3. void acquireDeviceModifySparse
+4. void acquireHostRead()
+5. void acquireHostModify()
+6. void releaseInput()
+7. void releaseOutput()
 
-## JCudaContext:
-The current prototype supports Nvidia's CUDA libraries using JCuda wrapper. 
The implementation for the above classes can be found in:
-1. `org.apache.sysml.runtime.controlprogram.context.JCudaContext`
-2. `org.apache.sysml.runtime.controlprogram.context.JCudaObject`
+Sparse matrices on GPU are represented in `CSR` format. In the SystemML 
runtime, they are represented in `MCSR` or modified `CSR` format.
+A conversion cost is incurred when sparse matrices are sent back and forth 
between host and device memory.
 
-### Setup instructions for JCudaContext:
+Concrete classes `JCudaContext` and `JCudaObject` (which extend `GPUContext` & 
`GPUObject` respectively) contain references to `org.jcuda.*`.
 
-1. Follow the instructions from `https://developer.nvidia.com/cuda-downloads` 
and install CUDA 7.5.
-2. Follow the instructions from `https://developer.nvidia.com/cudnn` and 
install CuDNN v4.
-3. Download install JCuda binaries version 0.7.5b and JCudnn version 0.7.5. 
Easiest option would be to use mavenized jcuda: 
-```python
-git clone https://github.com/MysterionRise/mavenized-jcuda.git
-mvn -Djcuda.version=0.7.5b -Djcudnn.version=0.7.5 clean package
-CURR_DIR=`pwd`
-JCUDA_PATH=$CURR_DIR"/target/lib/"
-JAR_PATH="."
-for j in `ls $JCUDA_PATH/*.jar`
-do
-        JAR_PATH=$JAR_PATH":"$j
-done
-export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JCUDA_PATH
-```
+The `LibMatrixCUDA` class contains methods to invoke CUDA libraries (where 
available) and invoke custom kernels. 
+Runtime classes (that extend `GPUInstruction`) redirect calls to functions in 
this class.
+Some functions in `LibMatrixCUDA` need finer control over GPU memory 
management primitives. These are provided by `JCudaObject`.
+
+### Setup instructions:
 
-Note for Windows users:
-* CuDNN v4 is available to download: 
`http://developer.download.nvidia.com/compute/redist/cudnn/v4/cudnn-7.0-win-x64-v4.0-prod.zip`
-* If above steps doesn't work for JCuda, copy the DLLs into C:\lib (or /lib) 
directory.
+1. Follow the instructions from `https://developer.nvidia.com/cuda-downloads` 
and install CUDA 8.0.
+2. Follow the instructions from `https://developer.nvidia.com/cudnn` and 
install CuDNN v5.1.
 
-To use SystemML's GPU backend, 
+To use SystemML's GPU backend when using the jar or uber-jar
 1. Add JCuda's jar into the classpath.
-2. Include CUDA, CuDNN and JCuda's libraries in LD_LIBRARY_PATH (or using 
-Djava.library.path).
-3. Use `-gpu` flag.
+2. Use `-gpu` flag.
 
 For example: to use GPU backend in standalone mode:
-```python
-java -classpath $JAR_PATH:systemml-0.10.0-incubating-SNAPSHOT-standalone.jar 
org.apache.sysml.api.DMLScript -f MyDML.dml -gpu -exec singlenode ... 
+```bash
+java -classpath $JAR_PATH:systemml-0.14.0-incubating-SNAPSHOT-standalone.jar 
org.apache.sysml.api.DMLScript -f MyDML.dml -gpu -exec singlenode ... 
 ```

http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/3757995b/pom.xml
----------------------------------------------------------------------
diff --git a/pom.xml b/pom.xml
index e044c05..f54b058 100644
--- a/pom.xml
+++ b/pom.xml
@@ -71,10 +71,12 @@
                <scala.test.version>2.2.6</scala.test.version>
                <maven.build.timestamp.format>yyyy-MM-dd HH:mm:ss 
z</maven.build.timestamp.format>
                <enableGPU>false</enableGPU>
+               <jcuda.scope>provided</jcuda.scope>
+               <jcuda.version>0.8.0</jcuda.version>
                <!-- OS-specific JVM arguments for running integration tests -->
                <integrationTestExtraJVMArgs />
        </properties>
-       
+
        <repositories>
         <repository>
           <id>central</id>
@@ -83,14 +85,6 @@
             <enabled>true</enabled>
           </releases>
         </repository>
-        <repository>
-            <id>mavenized-jcuda-mvn-repo</id>
-            
<url>https://raw.github.com/niketanpansare/mavenized-jcuda/mvn-repo/</url>
-            <snapshots>
-                <enabled>true</enabled>
-                <updatePolicy>always</updatePolicy>
-            </snapshots>
-        </repository>
     </repositories>
 
        <build>
@@ -169,6 +163,13 @@
                                                <goals>
                                                        <goal>shade</goal>
                                                </goals>
+                                               <configuration>
+                                                 <artifactSet>
+                                                         <!--<excludes>
+                                                               
<exclude>org.jcuda:*</exclude>
+                                                       </excludes>-->
+                                                 </artifactSet>
+                                               </configuration>
                                        </execution>
                                </executions>
 
@@ -568,6 +569,60 @@
        </build>
 
        <profiles>
+
+               <profile>
+                       <id>windows-x86_64</id>
+                       <activation>
+                               <os>
+                                       <family>windows</family>
+                                       <arch>amd64</arch>
+                               </os>
+                       </activation>
+                       <properties>
+                               <jcuda.os>windows</jcuda.os>
+                               <jcuda.arch>x86_64</jcuda.arch>
+                       </properties>
+               </profile>
+               <profile>
+                       <id>linux-x86_64</id>
+                       <activation>
+                               <os>
+                                       <family>unix</family>
+                                       <arch>amd64</arch>
+                               </os>
+                       </activation>
+                       <properties>
+                               <jcuda.os>linux</jcuda.os>
+                               <jcuda.arch>x86_64</jcuda.arch>
+                       </properties>
+               </profile>
+               <profile>
+                       <id>apple-x86_64</id>
+                       <activation>
+                               <os>
+                                       <family>mac</family>
+                                       <arch>x86_64</arch>
+                               </os>
+                       </activation>
+                       <properties>
+                               <jcuda.os>apple</jcuda.os>
+                               <jcuda.arch>x86_64</jcuda.arch>
+                       </properties>
+               </profile>
+               <profile>
+                       <id>linux-ppc_64</id>
+                       <activation>
+                               <os>
+                                       <family>unix</family>
+                                       <arch>ppc64le</arch>
+                               </os>
+                       </activation>
+                       <properties>
+                               <jcuda.os>linux</jcuda.os>
+                               <jcuda.arch>ppc_64</jcuda.arch>
+                       </properties>
+               </profile>
+
                <profile>
                        <id>scala-2.10</id>
                        <properties>
@@ -575,7 +630,7 @@
                                
<scala.binary.version>2.10</scala.binary.version>
                        </properties>
                </profile>
-               
+
                <profile>
                        <id>scala-2.11</id>
                        <properties>
@@ -811,7 +866,7 @@
                                                        </execution>
                                                </executions>
                                        </plugin>
-                                       
+
                                        <plugin>
                                                
<artifactId>maven-gpg-plugin</artifactId>
                                                <version>1.6</version>
@@ -1032,50 +1087,112 @@
 
 
        <dependencies>
-       
-               <!-- For GPU backend
-                       Use org.mystic:mavenized-jcuda until Alan puts 
org.jcuda:*
-                -->
-               <dependency>
-                       <groupId>org.mystic</groupId>
-            <artifactId>mavenized-jcuda</artifactId>
-            <version>0.7.5b</version>
-            <type>jar</type>
-            <scope>provided</scope>
-            <exclusions>
-                       <exclusion>
-                           <groupId>*</groupId>
-                           <artifactId>*</artifactId>
-                       </exclusion>
-                   </exclusions>
-               </dependency>
-               <!-- Since there is no mvn repo for jcuda
+
                <dependency>
                        <groupId>org.jcuda</groupId>
                        <artifactId>jcuda</artifactId>
-                       <version>0.7.5b</version>
-                       <scope>provided</scope>
+                       <version>${jcuda.version}</version>
+                       <scope>${jcuda.scope}</scope>
                </dependency>
                <dependency>
                        <groupId>org.jcuda</groupId>
                        <artifactId>jcublas</artifactId>
-                       <version>0.7.5b</version>
-                       <scope>provided</scope>
+                       <version>${jcuda.version}</version>
+                       <scope>${jcuda.scope}</scope>
+               </dependency>
+               <dependency>
+                       <groupId>org.jcuda</groupId>
+                       <artifactId>jcufft</artifactId>
+                       <version>${jcuda.version}</version>
+                       <scope>${jcuda.scope}</scope>
                </dependency>
                <dependency>
                        <groupId>org.jcuda</groupId>
                        <artifactId>jcusparse</artifactId>
-                       <version>0.7.5b</version>
-                       <scope>provided</scope>
+                       <version>${jcuda.version}</version>
+                       <scope>${jcuda.scope}</scope>
+               </dependency>
+               <dependency>
+                       <groupId>org.jcuda</groupId>
+                       <artifactId>jcusolver</artifactId>
+                       <version>${jcuda.version}</version>
+                       <scope>${jcuda.scope}</scope>
+               </dependency>
+               <dependency>
+                       <groupId>org.jcuda</groupId>
+                       <artifactId>jcurand</artifactId>
+                       <version>${jcuda.version}</version>
+                       <scope>${jcuda.scope}</scope>
+               </dependency>
+               <dependency>
+                       <groupId>org.jcuda</groupId>
+                       <artifactId>jnvgraph</artifactId>
+                       <version>${jcuda.version}</version>
+                       <scope>${jcuda.scope}</scope>
                </dependency>
                <dependency>
                        <groupId>org.jcuda</groupId>
                        <artifactId>jcudnn</artifactId>
-                       <version>0.7.5</version>
-                       <scope>provided</scope>
+                       <version>${jcuda.version}</version>
+                       <scope>${jcuda.scope}</scope>
+               </dependency>
+
+               <dependency>
+                       <groupId>org.jcuda</groupId>
+                       <artifactId>jcuda-natives</artifactId>
+                       <classifier>${jcuda.os}-${jcuda.arch}</classifier>
+                       <version>${jcuda.version}</version>
+                       <scope>${jcuda.scope}</scope>
+               </dependency>
+               <dependency>
+                       <groupId>org.jcuda</groupId>
+                       <artifactId>jcublas-natives</artifactId>
+                       <classifier>${jcuda.os}-${jcuda.arch}</classifier>
+                       <version>${jcuda.version}</version>
+                       <scope>${jcuda.scope}</scope>
+               </dependency>
+               <dependency>
+                       <groupId>org.jcuda</groupId>
+                       <artifactId>jcufft-natives</artifactId>
+                       <classifier>${jcuda.os}-${jcuda.arch}</classifier>
+                       <version>${jcuda.version}</version>
+                       <scope>${jcuda.scope}</scope>
+               </dependency>
+               <dependency>
+                       <groupId>org.jcuda</groupId>
+                       <artifactId>jcusparse-natives</artifactId>
+                       <classifier>${jcuda.os}-${jcuda.arch}</classifier>
+                       <version>${jcuda.version}</version>
+                       <scope>${jcuda.scope}</scope>
+               </dependency>
+               <dependency>
+                       <groupId>org.jcuda</groupId>
+                       <artifactId>jcusolver-natives</artifactId>
+                       <classifier>${jcuda.os}-${jcuda.arch}</classifier>
+                       <version>${jcuda.version}</version>
+                       <scope>${jcuda.scope}</scope>
+               </dependency>
+               <dependency>
+                       <groupId>org.jcuda</groupId>
+                       <artifactId>jcurand-natives</artifactId>
+            <classifier>${jcuda.os}-${jcuda.arch}</classifier>
+            <version>${jcuda.version}</version>
+                       <scope>${jcuda.scope}</scope>
+               </dependency>
+               <dependency>
+                       <groupId>org.jcuda</groupId>
+                       <artifactId>jnvgraph-natives</artifactId>
+                       <classifier>${jcuda.os}-${jcuda.arch}</classifier>
+                       <version>${jcuda.version}</version>
+                       <scope>${jcuda.scope}</scope>
+               </dependency>
+               <dependency>
+                       <groupId>org.jcuda</groupId>
+                       <artifactId>jcudnn-natives</artifactId>
+                       <classifier>${jcuda.os}-${jcuda.arch}</classifier>
+                       <version>${jcuda.version}</version>
+                       <scope>${jcuda.scope}</scope>
                </dependency>
-                -->
-               <!-- ************************* -->
 
                <dependency>
                        <groupId>org.apache.spark</groupId>

http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/3757995b/src/main/java/org/apache/sysml/runtime/matrix/data/LibMatrixCUDA.java
----------------------------------------------------------------------
diff --git 
a/src/main/java/org/apache/sysml/runtime/matrix/data/LibMatrixCUDA.java 
b/src/main/java/org/apache/sysml/runtime/matrix/data/LibMatrixCUDA.java
index 31ec348..51a0f6b 100644
--- a/src/main/java/org/apache/sysml/runtime/matrix/data/LibMatrixCUDA.java
+++ b/src/main/java/org/apache/sysml/runtime/matrix/data/LibMatrixCUDA.java
@@ -25,6 +25,7 @@ import static jcuda.jcudnn.JCudnn.cudnnActivationForward;
 import static jcuda.jcudnn.JCudnn.cudnnConvolutionBackwardData;
 import static jcuda.jcudnn.JCudnn.cudnnConvolutionBackwardFilter;
 import static jcuda.jcudnn.JCudnn.cudnnConvolutionForward;
+import static jcuda.jcudnn.JCudnn.cudnnCreateActivationDescriptor;
 import static jcuda.jcudnn.JCudnn.cudnnCreateConvolutionDescriptor;
 import static jcuda.jcudnn.JCudnn.cudnnCreateFilterDescriptor;
 import static jcuda.jcudnn.JCudnn.cudnnCreatePoolingDescriptor;
@@ -38,12 +39,14 @@ import static 
jcuda.jcudnn.JCudnn.cudnnGetConvolutionBackwardFilterWorkspaceSize
 import static jcuda.jcudnn.JCudnn.cudnnGetConvolutionForwardWorkspaceSize;
 import static jcuda.jcudnn.JCudnn.cudnnPoolingBackward;
 import static jcuda.jcudnn.JCudnn.cudnnPoolingForward;
+import static jcuda.jcudnn.JCudnn.cudnnSetActivationDescriptor;
 import static jcuda.jcudnn.JCudnn.cudnnSetConvolution2dDescriptor;
 import static jcuda.jcudnn.JCudnn.cudnnSetFilter4dDescriptor;
 import static jcuda.jcudnn.JCudnn.cudnnSetPooling2dDescriptor;
 import static jcuda.jcudnn.JCudnn.cudnnSetTensor4dDescriptor;
 import static jcuda.jcudnn.cudnnConvolutionMode.CUDNN_CROSS_CORRELATION;
 import static jcuda.jcudnn.cudnnDataType.CUDNN_DATA_DOUBLE;
+import static jcuda.jcudnn.cudnnNanPropagation.CUDNN_PROPAGATE_NAN;
 import static jcuda.jcudnn.cudnnPoolingMode.CUDNN_POOLING_MAX;
 import static jcuda.jcudnn.cudnnTensorFormat.CUDNN_TENSOR_NCHW;
 import static jcuda.jcusparse.JCusparse.cusparseDcsrgemm;
@@ -75,6 +78,7 @@ import jcuda.jcublas.JCublas2;
 import jcuda.jcublas.cublasFillMode;
 import jcuda.jcublas.cublasHandle;
 import jcuda.jcublas.cublasOperation;
+import jcuda.jcudnn.cudnnActivationDescriptor;
 import jcuda.jcudnn.cudnnConvolutionDescriptor;
 import jcuda.jcudnn.cudnnConvolutionFwdPreference;
 import jcuda.jcudnn.cudnnFilterDescriptor;
@@ -268,7 +272,7 @@ public class LibMatrixCUDA {
        private static cudnnFilterDescriptor allocateFilterDescriptor(int K, 
int C, int R, int S) {
                cudnnFilterDescriptor filterDesc = new cudnnFilterDescriptor();
                cudnnCreateFilterDescriptor(filterDesc);
-               cudnnSetFilter4dDescriptor(filterDesc, CUDNN_DATA_DOUBLE, K, C, 
R, S);
+               cudnnSetFilter4dDescriptor(filterDesc, CUDNN_DATA_DOUBLE, 
CUDNN_TENSOR_NCHW, K, C, R, S);
                return filterDesc;
        }
 
@@ -285,7 +289,7 @@ public class LibMatrixCUDA {
        private static cudnnPoolingDescriptor allocatePoolingDescriptor(int R, 
int S, int pad_h, int pad_w, int stride_h, int stride_w) {
                cudnnPoolingDescriptor poolingDesc = new 
cudnnPoolingDescriptor();
                cudnnCreatePoolingDescriptor(poolingDesc);
-               cudnnSetPooling2dDescriptor(poolingDesc, CUDNN_POOLING_MAX, R, 
S, pad_h, pad_w, stride_h, stride_w);
+               cudnnSetPooling2dDescriptor(poolingDesc, CUDNN_POOLING_MAX, 
CUDNN_PROPAGATE_NAN, R, S, pad_h, pad_w, stride_h, stride_w);
                return poolingDesc;
        }
 
@@ -474,10 +478,13 @@ public class LibMatrixCUDA {
                                // Allocate descriptors
                                srcTensorDesc = 
allocateTensorDescriptor((int)N, 1, (int)H, (int)W);
                                dstTensorDesc = 
allocateTensorDescriptor((int)N, 1, (int)H, (int)W);
-
-                               cudnnActivationForward(cudnnHandle, 
CUDNN_ACTIVATION_RELU,
-                                                               alpha, 
srcTensorDesc, srcData,
-                                                               beta, 
dstTensorDesc, dstData);
+                cudnnActivationDescriptor activationDescriptor = new 
cudnnActivationDescriptor();
+                               
cudnnCreateActivationDescriptor(activationDescriptor);
+                double dummy = -1;
+                cudnnSetActivationDescriptor(activationDescriptor, 
CUDNN_ACTIVATION_RELU, CUDNN_PROPAGATE_NAN, dummy);
+                   cudnnActivationForward(cudnnHandle, activationDescriptor,
+                       alpha, srcTensorDesc, srcData, 
+                       beta, dstTensorDesc, dstData);
                        }
                }
                finally {

Reply via email to