On Wed, Jul 24, 2013 at 11:47 AM, Reynold Xin <[email protected]> wrote:
> On Wed, Jul 24, 2013 at 1:46 AM, Nick Pentreath <[email protected] > >wrote: > > > > > I also found Breeze to be very nice to work with and like the DSL - hence > > my question about why not use that? (Especially now that Breeze is > actually > > just breeze-math and breeze-viz). > > > > > Matei addressed this from a higher level. I want to provide a little bit > more context. A common properties of a lot of high level Scala DSL > libraries is that simple operators tend to have high virtual function > overheads and also create a lot of temporary objects. And because the level > of abstraction is so high, it is fairly hard to debug / optimize > performance. > I was *kinda* worried about it too. But like it often happens, it would seem to me we are worrying about somethign that will never compare with the bulk computation. Consider this fragment (it is from one of the flavors of weighted ALS with weighted regularization) : val cholArg = icVtV + (vBlockForC.t %*%: diagv(d)) %*% vBlockForC + diag(n_u * lambda, k) yes we just created a few object references here for GC with scala implicit conversions while having million flops behind the scene sent to FPU meanwhile AND we have optimized left-multiply (%*%: operator) with a diagonal matrix as well as made use of symmetric matrix optimizations in a very scala way. And it looks just like what R folks would understand. The benefits of DSL clearly outweigh whatever claimed overhead exists IMO. Can't deny i find it subjectively more elegant than vblock.transpose().times(....new DiagonalMatrix (n_u*lambda,k) ... ) As far as Mahout abstraction quality (and i am assuming we are talking about in-core lin alg support here, cause there's much more else), this is debatable but that's why i actually started doing DSL in the first place. DSL should iron a lot of that out, as we have seen, and bring it closer to R/matlab look&feel. But there are other important factors about Mahout's in-core support. I did my honest homework for my project trying to pick on in-core linear algebra, and i was not stuck on Mahout in-core support at all. I actually really wanted to find something a bit more mature for in-core algebra. In my search, I failed to find a project to address the following two major problems for in-core BLAS: 1) naturally embedded support for sparse matrices and optimizations aimed degenerate nature of zero elements. No other project quite doing it to the same degree. apache-math sparse matrices are deprecated and said to be broken. JBLAS/ LAPACK doesn't have degenerate element optimizations at all. Breeze lacks consistency in Matrix abstraction between sparse and dense matrices. etc. etc. 2) Kind of extension of the 1, wide range of matrix support optimized for various speicific solver computations -- diagonal, upper/lower triangulars, symmetric parsimonious, pivoted, rowwise vs. column wise vs. open addressed sparse matrices etc. etc. especially with the latest effort there. Nobody came close to that variety and ease of sparse operation optimizations in my (however brief) search. it is kinda raw at times, but nothing that i can't handle. But i totally agree that any sort of such environment is not part of spark. It makes some pragmatic tasks very addressable though and i can see a roadmap where i could mix Mahout's distributed solvers with sparks freely until i have a chance to port/create more what i need on spark side without any additional format/conversion issues. > > > > -- > Reynold Xin, AMPLab, UC Berkeley > http://rxin.org >
