solvers on spark

Ted Dunning Mon, 24 Jun 2013 11:32:18 -0700

Dmitriy,

This is very pretty.





On Mon, Jun 24, 2013 at 6:48 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:

> Ok, so i was fairly easily able to build some DSL for our matrix
> manipulation (similar to breeze) in scala:
>
> inline matrix or vector:
>
> val  a = dense((1, 2, 3), (3, 4, 5))
>
> val b:Vector = (1,2,3)
>
> block views and assignments (element/row/vector/block/block of row or
> vector)
>
>
> a(::, 0)
> a(1, ::)
> a(0 to 1, 1 to 2)
>
> assignments
>
> a(0, ::) :=(3, 5, 7)
> a(0, 0 to 1) :=(3, 5)
> a(0 to 1, 0 to 1) := dense((1, 1), (2, 2.5))
>
> operators
>
> // hadamard
> val c = a * b
>  a *= b
>
> // matrix mul
>  val m = a %*% b
>
> and bunch of other little things like sum, mean, colMeans etc. That much is
> easy.
>
> Also stuff like the ones found in breeze along the lines
>
> val (u,v,s) = svd(a)
>
> diag ((1,2,3))
>
> and Cholesky in similar ways.
>
> I don't have "inline" initialization for sparse things (yet) simply because
> i don't need them, but of course all regular java constructors and methods
> are retained, all that is just a syntactic sugar in the spirit of DSLs in
> hope to make things a bit mroe readable.
>
> my (very little, and very insignificantly opinionated, really) criticism of
> Breeze in this context is its inconsistency between dense and sparse
> representations, namely, lack of consistent overarching trait(s), so that
> building structure-agnostic solvers like Mahout's Cholesky solver is
> impossible, as well as cross-type matrix use (say, the way i understand it,
> it is pretty much imposible to multiply a sparse matrix by a dense matrix).
>
> I suspect these problems stem from the fact that the authors for whatever
> reason decided to hardwire dense things with JBlas solvers whereas i dont
> believe matrix storage structures must be. But these problems do appear to
> be serious enough  for me to ignore Breeze for now. If i decide to plug in
> jblas dense solvers, i guess i will just have them as yet another top-level
> routine interface taking any Matrix, e.g.
>
> val (u,v,s) = svd(m, jblas=true)
>
>
>
> On Sun, Jun 23, 2013 at 7:08 PM, Dmitriy Lyubimov <dlie...@gmail.com>
> wrote:
>
> > Thank you.
> > On Jun 23, 2013 6:16 PM, "Ted Dunning" <ted.dunn...@gmail.com> wrote:
> >
> >> I think that this contract has migrated a bit from the first starting
> >> point.
> >>
> >> My feeling is that there is a de facto contract now that the matrix
> slice
> >> is a single row.
> >>
> >> Sent from my iPhone
> >>
> >> On Jun 23, 2013, at 16:32, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
> >>
> >> > What does Matrix. iterateAll() contractually do? Practically it seems
> >> to be
> >> > row wise iteration for some implementations but it doesnt seem
> >> > contractually state so in the javadoc. What is MatrixSlice if it is
> >> neither
> >> > a row nor a colimn? How can i tell what exactly it is i am iterating
> >> over?
> >> > On Jun 19, 2013 12:21 AM, "Ted Dunning" <ted.dunn...@gmail.com>
> wrote:
> >> >
> >> >> On Wed, Jun 19, 2013 at 5:29 AM, Jake Mannix <jake.man...@gmail.com>
> >> >> wrote:
> >> >>
> >> >>>> Question #2: which in-core solvers are available for Mahout
> >> matrices? I
> >> >>>> know there's SSVD, probably Cholesky, is there something else? In
> >> >>>> paticular, i need to be solving linear systems, I guess Cholesky
> >> should
> >> >>> be
> >> >>>> equipped enough to do just that?
> >> >>>>
> >> >>>> Question #3: why did we try to import Colt solvers rather than
> >> actually
> >> >>>> depend on Colt in the first place? Why did we not accept Colt's
> >> sparse
> >> >>>> matrices and created native ones instead?
> >> >>>>
> >> >>>> Colt seems to have a notion of parse in-core matrices too and seems
> >> >> like
> >> >>> a
> >> >>>> well-rounded solution. However, it doesn't seem like being actively
> >> >>>> supported, whereas I know Mahout experienced continued enhancements
> >> to
> >> >>> the
> >> >>>> in-core matrix support.
> >> >>>>
> >> >>>
> >> >>> Colt was totally abandoned, and I talked to the original author and
> he
> >> >>> blessed it's adoption.  When we pulled it in, we found it was
> woefully
> >> >>> undertested,
> >> >>> and tried our best to hook it in with proper tests and use APIs that
> >> fit
> >> >>> with
> >> >>> the use cases we had.  Plus, we already had the start of some linear
> >> apis
> >> >>> (i.e.
> >> >>> the Vector interface) and dropping the API completely seemed not
> >> terribly
> >> >>> worth it at the time.
> >> >>>
> >> >>
> >> >> There was even more to it than that.
> >> >>
> >> >> Colt was under-tested and there have been warts that had to be pulled
> >> out
> >> >> in much of the code.
> >> >>
> >> >> But, worse than that, Colt's matrix and vector structure was a real
> >> bugger
> >> >> to extend or change.  It also had all kinds of cruft where it
> >> pretended to
> >> >> support matrices of things, but in fact only supported matrices of
> >> doubles
> >> >> and floats.
> >> >>
> >> >> So using Colt as it was (and is since it is largely abandoned) was a
> >> >> non-starter.
> >> >>
> >> >> As far as in-memory solvers, we have:
> >> >>
> >> >> 1) LR decomposition (tested and kinda fast)
> >> >>
> >> >> 2) Cholesky decomposition (tested)
> >> >>
> >> >> 3) SVD (tested)
> >> >>
> >>
> >
>

Re: Mahout vectors/matrices/solvers on spark

Reply via email to