Hi Niketan, Good idea, I think that would be the cleanest solution for now. Since JCuda doesn't appear to be in a public maven repo, it adds a layer of difficulty to clean integration via maven builds.
Deron On Wed, May 18, 2016 at 10:55 AM, Niketan Pansare <npan...@us.ibm.com> wrote: > Hi Deron, > > Good points. I vote that we keep JCUDA and other accelerators we add as an > external dependency. This means the user will have to ensure JCuda.jar in > the class path and JCuda.DLL/JCuda.so in the LD_LIBRARY_PATH. > > I don't think JCuda.jar is platform-specific. > > Thanks, > > Niketan Pansare > IBM Almaden Research Center > E-mail: npansar At us.ibm.com > http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar > > [image: Inactive hide details for Deron Eriksson ---05/18/2016 10:51:17 > AM---Hi, I'm wondering what would be a good way to handle JCuda]Deron > Eriksson ---05/18/2016 10:51:17 AM---Hi, I'm wondering what would be a good > way to handle JCuda in terms of the > > From: Deron Eriksson <deroneriks...@gmail.com> > To: dev@systemml.incubator.apache.org > Date: 05/18/2016 10:51 AM > Subject: Re: Discussion on GPU backend > ------------------------------ > > > > Hi, > > I'm wondering what would be a good way to handle JCuda in terms of the > build release packages. Currently we have 11 artifacts that we are > building: > systemml-0.10.0-incubating-SNAPSHOT-inmemory.jar > systemml-0.10.0-incubating-SNAPSHOT-javadoc.jar > systemml-0.10.0-incubating-SNAPSHOT-sources.jar > systemml-0.10.0-incubating-SNAPSHOT-src.tar.gz > systemml-0.10.0-incubating-SNAPSHOT-src.zip > systemml-0.10.0-incubating-SNAPSHOT-standalone.jar > systemml-0.10.0-incubating-SNAPSHOT-standalone.tar.gz > systemml-0.10.0-incubating-SNAPSHOT-standalone.zip > systemml-0.10.0-incubating-SNAPSHOT.jar > systemml-0.10.0-incubating-SNAPSHOT.tar.gz > systemml-0.10.0-incubating-SNAPSHOT.zip > > It looks like JCuda is platform-specific, so you typically need different > jars/dlls/sos/etc for each platform. If I'm understanding things correctly, > if we generated Windows/Linux/LinuxPowerPC/MacOS-specific SystemML > artifacts for JCuda, we'd potentially have an enormous number of artifacts. > > Is this something that could be potentially handled by specific profiles in > the pom so that a user might be able to do something like "mvn clean > package -P jcuda-windows" so that a user could be responsible for building > the platform-specific SystemML jar for jcuda? Or is this something that > could be handled differently, by putting the platform-specific jcuda jar on > the classpath and any dlls or other needed libraries on the path? > > Deron > > > > On Tue, May 17, 2016 at 10:50 PM, Niketan Pansare <npan...@us.ibm.com> > wrote: > > > Hi Luciano, > > > > Like all our backends, there is no change in the programming model. The > > user submits a DML script and specifies whether she wants to use an > > accelerator. Assuming that we compile jcuda jars into SystemML.jar, the > > user can use GPU backend using following command: > > spark-submit --master yarn-client ... -f MyAlgo.dml -accelerator -exec > > hybrid_spark > > > > The user also needs to set LD_LIBRARY_PATH that points to JCuda DLL or so > > files. Please see *https://issues.apache.org/jira/browse/SPARK-1720* > > <https://issues.apache.org/jira/browse/SPARK-1720> ... For example: the > > > user can add following to spark-env.sh > > export LD_LIBRARY_PATH=<path to jcuda so>:$LD_LIBRARY_PATH > > > > The first version of GPU backend will only accelerate CP. In this case, > we > > have four types of instructions: > > 1. CP > > 2. GPU (requires GPU on the driver) > > 3. SPARK > > 4. MR > > > > Note, the first version will require the CUDA/JCuda dependency to be > > installed on the driver only. > > > > The next version will accelerate our distributed instructions as well. In > > this case, we will have six types of instructions: > > 1. CP > > 2. GPU > > 3. SPARK > > 4. MR > > 5. SPARK-GPU (requires GPU cluster) > > 6. MR-GPU (requires GPU cluster) > > > > Thanks, > > > > Niketan Pansare > > IBM Almaden Research Center > > E-mail: npansar At us.ibm.com > > > http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar > > > > > [image: Inactive hide details for Luciano Resende ---05/17/2016 09:13:24 > > PM---Great to see detailed information on this topic Niketan,]Luciano > > Resende ---05/17/2016 09:13:24 PM---Great to see detailed information on > > this topic Niketan, I guess I have missed when you posted it in > > > > From: Luciano Resende <luckbr1...@gmail.com> > > To: dev@systemml.incubator.apache.org > > Date: 05/17/2016 09:13 PM > > Subject: Re: Discussion on GPU backend > > ------------------------------ > > > > > > > > > Great to see detailed information on this topic Niketan, I guess I have > > missed when you posted it initially. > > > > Could you elaborate a little more on what is the programming model for > when > > the user wants to leverage GPU ? Also, today I can submit a job to spark > > using --jars and it will handle copying the dependencies to the worker > > nodes. If my application wants to leverage GPU, what extras dependencies > > will be required on the worker nodes, and how they are going to be > > installed/updated on the Spark cluster ? > > > > > > > > On Tue, May 3, 2016 at 1:26 PM, Niketan Pansare <npan...@us.ibm.com> > > wrote: > > > > > > > > > > > Hi all, > > > > > > I have updated the design document for our GPU backend in the JIRA > > > > https://issues.apache.org/jira/browse/SYSTEMML-445. The implementation > > > > details are based on the prototype I created and is available in PR > > > > https://github.com/apache/incubator-systemml/pull/131. Once we are done > > > > with the discussion, I can clean up and separate out the GPU backend > in a > > > separate PR for easier review :) > > > > > > Here are key design points: > > > A GPU backend would implement two abstract classes: > > > 1. GPUContext > > > 2. GPUObject > > > > > > > > > > > > The GPUContext is responsible for GPU memory management and gets > > call-backs > > > from SystemML's bufferpool on following methods: > > > 1. void acquireRead(MatrixObject mo) > > > 2. void acquireModify(MatrixObject mo) > > > 3. void release(MatrixObject mo, boolean isGPUCopyModified) > > > 4. void exportData(MatrixObject mo) > > > 5. void evict(MatrixObject mo) > > > > > > > > > > > > A GPUObject (like RDDObject and BroadcastObject) is stored in > > CacheableData > > > object. It contains following methods that are called back from the > > > corresponding GPUContext: > > > 1. void allocateMemoryOnDevice() > > > 2. void deallocateMemoryOnDevice() > > > 3. long getSizeOnDevice() > > > 4. void copyFromHostToDevice() > > > 5. void copyFromDeviceToHost() > > > > > > > > > > > > In the initial implementation, we will add JCudaContext and > JCudaPointer > > > that will extend the above abstract classes respectively. The > > JCudaContext > > > will be created by ExecutionContextFactory depending on the > > user-specified > > > accelarator. Analgous to MR/SPARK/CP, we will add a new ExecType: GPU > and > > > implement GPU instructions. > > > > > > The above design is general enough so that other people can implement > > > custom accelerators (for example: OpenCL) and also follows the design > > > principles of our CP bufferpool. > > > > > > Thanks, > > > > > > Niketan Pansare > > > IBM Almaden Research Center > > > E-mail: npansar At us.ibm.com > > > > http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar > > > > > > > > > > > > -- > > Luciano Resende > > http://twitter.com/lresende1975 > > http://lresende.blogspot.com/ > > > > > > > > > > > >