Yes, obviously this works well only for Dense matrices. I had even contemplated on inheriting JBlasMatrix from DenseMatrix.
Is there a roadmap (or collection of thoughts which approximate a road map), so that there is some sort of a guideline as to what lines of investigation for contributions makes sense? On Wed, Aug 13, 2014 at 5:06 PM, Ted Dunning <[email protected]> wrote: > Now try multiplying a 1 million by 1 million sparse matrix with 100 > non-zeros in each row by another such matrix. > > Also try a 16k x 16k dense matrix. > > And a 10 x 10 dense matrix. > > The moral is that jBlas and similar things are great for medium sized dense > matrix. Sparse systems aren't helped. Large dense systems have problem on > GPU's but work great with native BLAS. Small dense systems have problems > JNI boundaries and GPU memory architectures. > > So far, much of the Mahout work has been large sparse systems so it has > been worthwhile to build a sparse optimizer, but not so very worthwhile to > build fancy stuff for the dense cases. > > That may have changed which the higher profile of things like ALS and > random projection decompositions. Even k-means can be recast using random > projection to be a dense matrix heavy algorithm. > > What do you think is the right course? > > > > > On Wed, Aug 13, 2014 at 3:39 PM, Anand Avati <[email protected]> wrote: > > > On Fri, Jul 18, 2014 at 12:01 PM, Dmitriy Lyubimov <[email protected]> > > wrote: > > > > > On Fri, Jul 18, 2014 at 11:54 AM, Anand Avati <[email protected]> > wrote: > > > > > > > On Fri, Jul 18, 2014 at 11:42 AM, Dmitriy Lyubimov < > [email protected]> > > > > wrote: > > > > > > > > > > > > Co incidentally I was wildly imagining/exploring integration with the > > > > fortran blas behind the in-core DSL using jni. I had not come across > > > these > > > > BIDData projects. I'm happy to reorient that effort towards exploring > > > > these. > > > > > > > > > > Well, it's both. JBlas & JCublas. should be too expensive. > > > > > > if i had to choose, i'd say integrate jCublas first, seems to be a bit > of > > > an edge here. We already know from Sebastien's work with jblas that its > > > integration for sparse methods is not that interesting. > > > > > > However, even vector-vector operations over views of gpu-stored data > > > become somewhat interesting in context of dsl operators. > > > > > > > FYI, I was toying around a jBLAS backend for Matrix / Vector (at > > https://github.com/apache/mahout/pull/44). Started with jBLAS only > because > > I found better documentation. Testing on my laptop a 1024x1024 matrix > > multiplication of random numbers, found a solid 56x faster runtime: > > > > Run starting. Expected test count is: 1 > > DiscoverySuite: > > JBlasSuite: > > Normal multiplication (ms) = 15900 > > jBLAS multiplication (ms) = 284 > > - matrix multiplication > > Run completed in 16 seconds, 793 milliseconds. > > > > > > This is a very trivial implementation with only matrix multiplication > > optimized. Better vector integration is possible along the same steps. > > However for deeper integration (for e.g transparent offloading of > > decompositions into jblas), some restructuring of API will make it simple > > and easy for consumers. For example what I mean is - instead of public > > CholeskyDecomposition(Matrix > > A) constructor, have public CholeskyDecomposition choleskydecompose() in > > Matrix interface. This way JBlasMatrix can transparently insert its own > > optimized decomp code and return it as an inherited object of > > CholeskyDecomposition class. > > > > Comments/feedback welcome. > > > > I also discovered there are other common code refactoring which can be > done > > (iterator, non-zero iterator code etc repeated many places) - separate > PRs > > for them. > > >
