Re: GPU, lapack Matrix adaptations

Anand Avati Wed, 13 Aug 2014 15:40:47 -0700

On Fri, Jul 18, 2014 at 12:01 PM, Dmitriy Lyubimov <[email protected]>
wrote:

> On Fri, Jul 18, 2014 at 11:54 AM, Anand Avati <[email protected]> wrote:
>
> > On Fri, Jul 18, 2014 at 11:42 AM, Dmitriy Lyubimov <[email protected]>
> > wrote:
> >
> >
> > Co incidentally I was wildly imagining/exploring integration with the
> > fortran blas behind the in-core DSL using jni. I had not come across
> these
> > BIDData projects. I'm happy to reorient that effort towards exploring
> > these.
> >
>
> Well, it's both. JBlas & JCublas. should be too expensive.
>
> if i had to choose, i'd say integrate jCublas first, seems to be a bit of
> an edge here. We already know from Sebastien's work with jblas that its
> integration for sparse methods is not that interesting.
>
> However, even  vector-vector operations over views of gpu-stored data
> become somewhat interesting in context of dsl operators.
>

FYI, I was toying around a jBLAS backend for Matrix / Vector (at
https://github.com/apache/mahout/pull/44). Started with jBLAS only because
I found better documentation. Testing on my laptop a 1024x1024 matrix
multiplication of random numbers, found a solid 56x faster runtime:

Run starting. Expected test count is: 1
DiscoverySuite:
JBlasSuite:
Normal multiplication (ms) = 15900
jBLAS multiplication (ms) = 284
- matrix multiplication
Run completed in 16 seconds, 793 milliseconds.

This is a very trivial implementation with only matrix multiplication
optimized. Better vector integration is possible along the same steps.
However for deeper integration (for e.g transparent offloading of
decompositions into jblas), some restructuring of API will make it simple
and easy for consumers. For example what I mean is - instead of public
CholeskyDecomposition(Matrix
A) constructor, have public CholeskyDecomposition choleskydecompose() in
Matrix interface. This way JBlasMatrix can transparently insert its own
optimized decomp code and return it as an inherited object of
CholeskyDecomposition class.

Comments/feedback welcome.

I also discovered there are other common code refactoring which can be done
(iterator, non-zero iterator code etc repeated many places) - separate PRs
for them.

Re: GPU, lapack Matrix adaptations

Reply via email to