On Fri, Jul 18, 2014 at 12:01 PM, Dmitriy Lyubimov <[email protected]> wrote:
> On Fri, Jul 18, 2014 at 11:54 AM, Anand Avati <[email protected]> wrote: > > > On Fri, Jul 18, 2014 at 11:42 AM, Dmitriy Lyubimov <[email protected]> > > wrote: > > > > > > Co incidentally I was wildly imagining/exploring integration with the > > fortran blas behind the in-core DSL using jni. I had not come across > these > > BIDData projects. I'm happy to reorient that effort towards exploring > > these. > > > > Well, it's both. JBlas & JCublas. should be too expensive. > > if i had to choose, i'd say integrate jCublas first, seems to be a bit of > an edge here. We already know from Sebastien's work with jblas that its > integration for sparse methods is not that interesting. > > However, even vector-vector operations over views of gpu-stored data > become somewhat interesting in context of dsl operators. > FYI, I was toying around a jBLAS backend for Matrix / Vector (at https://github.com/apache/mahout/pull/44). Started with jBLAS only because I found better documentation. Testing on my laptop a 1024x1024 matrix multiplication of random numbers, found a solid 56x faster runtime: Run starting. Expected test count is: 1 DiscoverySuite: JBlasSuite: Normal multiplication (ms) = 15900 jBLAS multiplication (ms) = 284 - matrix multiplication Run completed in 16 seconds, 793 milliseconds. This is a very trivial implementation with only matrix multiplication optimized. Better vector integration is possible along the same steps. However for deeper integration (for e.g transparent offloading of decompositions into jblas), some restructuring of API will make it simple and easy for consumers. For example what I mean is - instead of public CholeskyDecomposition(Matrix A) constructor, have public CholeskyDecomposition choleskydecompose() in Matrix interface. This way JBlasMatrix can transparently insert its own optimized decomp code and return it as an inherited object of CholeskyDecomposition class. Comments/feedback welcome. I also discovered there are other common code refactoring which can be done (iterator, non-zero iterator code etc repeated many places) - separate PRs for them.
