If you're contemplating GPU acceleration in Spark, its important to look beyond BLAS. Dense BLAS probably account for only 10% of the cycles in the datasets we've tested in BIDMach, and we've tried to make them representative of industry machine learning workloads. Unless you're crunching images or audio, the majority of data will be very sparse and power law distributed. You need a good sparse BLAS, and in practice it seems like you need a sparse BLAS tailored for power-law data. We had to write our own since the NVIDIA libraries didnt perform well on typical power-law data. Intel MKL sparse BLAS also have issues and we only use some of them.
You also need 2D reductions, scan operations, slicing, element-wise transcendental functions and operators, many kinds of sort, random number generators etc, and some kind of memory management strategy. Some of this was layered on top of Thrust in BIDMat, but most had to be written from scratch. Its all been rooflined, typically to memory throughput of current GPUs (around 200 GB/s). When you have all this you can write Learning Algorithms in the same high-level primitives available in Breeze or Numpy/Scipy. Its literally the same in BIDMat, since the generic matrix operations are implemented on both CPU and GPU, so the same code runs on either platform. A lesser known fact is that GPUs are around 10x faster for *all* those operations, not just dense BLAS. Its mostly due to faster streaming memory speeds, but some kernels (random number generation and transcendentals) are more than an order of magnitude thanks to some specialized hardware for power series on the GPU chip. When you have all this there is no need to move data back and forth across the PCI bus. The CPU only has to pull chunks of data off disk, unpack them, and feed them to the available GPUs. Most models fit comfortably in GPU memory these days (4-12 GB). With minibatch algorithms you can push TBs of data through the GPU this way. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Using-CUDA-within-Spark-boosting-linear-algebra-tp10481p11021.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org