Sorry, this should say it is true that sparse algebra is by far more compelling than dense one;
On Fri, Aug 15, 2014 at 9:54 AM, Dmitriy Lyubimov <[email protected]> wrote: > As i indicated, i think it is a worthy move. As i said before (including > Spark list), it is true that dense algebra is by far more compelling than > dense one; however, there are some considerations that make this work very > much worthwhile. to sum up my motivations: > > (1) even in methods currently put in, dense multiplications and > decompositions are happening, and may actually speed up things in certain > cases. > > (2) since main idea is ease of customization, it should be fairly low > consideration of how it may be useful for what's already inside, but rather > for potential use. I have developed internally methods using that algebra > that by sheer number outnumber those present in Mahout. Assuming other > power users will do the same (which is still largely just a hope at this > point), we'd be just looking like cavemen if we do not provide jCuda and > jBlas bindings. > > so that sums the motivation. > > Re: pull request. So that's a good start. > > As was mentioned in previous discussions, we are lacking cost-based > optimizer for binary matrix operators the same way it was done for vectors. > > E.g. we need some sort of generic entry point into matrix-matrix > operations that will make specific algorithm selection based on operand > types. For sparse types, some algorithms were already added by Ted but they > were not connected to this decision tree properly. For dense types, we > probably will need to run some empiric cost calibration analysis (i.e. if > arg A has native T multiplication and arg B does not, will it be faster to > convert B to native T and proceed natively, or vice versa, given geometry > and number of elements, etc. etc.) Imo this stuff has pretty unique > architectural opportunities for matrix centric operations. > > On another note, i think it is not worthwhile to support lapack/cuda > operation for vectors. > > > > > On Wed, Aug 13, 2014 at 3:39 PM, Anand Avati <[email protected]> wrote: > >> On Fri, Jul 18, 2014 at 12:01 PM, Dmitriy Lyubimov <[email protected]> >> wrote: >> >> > On Fri, Jul 18, 2014 at 11:54 AM, Anand Avati <[email protected]> >> wrote: >> > >> > > On Fri, Jul 18, 2014 at 11:42 AM, Dmitriy Lyubimov <[email protected] >> > >> > > wrote: >> > > >> > > >> > > Co incidentally I was wildly imagining/exploring integration with the >> > > fortran blas behind the in-core DSL using jni. I had not come across >> > these >> > > BIDData projects. I'm happy to reorient that effort towards exploring >> > > these. >> > > >> > >> > Well, it's both. JBlas & JCublas. should be too expensive. >> > >> > if i had to choose, i'd say integrate jCublas first, seems to be a bit >> of >> > an edge here. We already know from Sebastien's work with jblas that its >> > integration for sparse methods is not that interesting. >> > >> > However, even vector-vector operations over views of gpu-stored data >> > become somewhat interesting in context of dsl operators. >> > >> >> FYI, I was toying around a jBLAS backend for Matrix / Vector (at >> https://github.com/apache/mahout/pull/44). Started with jBLAS only >> because >> I found better documentation. Testing on my laptop a 1024x1024 matrix >> multiplication of random numbers, found a solid 56x faster runtime: >> >> Run starting. Expected test count is: 1 >> DiscoverySuite: >> JBlasSuite: >> Normal multiplication (ms) = 15900 >> jBLAS multiplication (ms) = 284 >> - matrix multiplication >> Run completed in 16 seconds, 793 milliseconds. >> >> >> This is a very trivial implementation with only matrix multiplication >> optimized. Better vector integration is possible along the same steps. >> However for deeper integration (for e.g transparent offloading of >> decompositions into jblas), some restructuring of API will make it simple >> and easy for consumers. For example what I mean is - instead of public >> CholeskyDecomposition(Matrix >> A) constructor, have public CholeskyDecomposition choleskydecompose() in >> Matrix interface. This way JBlasMatrix can transparently insert its own >> optimized decomp code and return it as an inherited object of >> CholeskyDecomposition class. >> >> Comments/feedback welcome. >> >> I also discovered there are other common code refactoring which can be >> done >> (iterator, non-zero iterator code etc repeated many places) - separate PRs >> for them. >> > >
