Re: Proposal: scala DSL module for Mahout linear algebra.

Jake Mannix Sat, 27 Jul 2013 08:01:13 -0700

I think my main concern is one of readability and hidden information: I
really _don't_ like having to know _anything_ about associativity rules,
and I'm not sure that catering to R users (*or* matlab users) is what we
want to do.  Maybe I'm thinking in a different direction with my scala
(+scalding) interop work, but I really am not aiming for some totally
fluent API for non-programmer analysts.  I'm not one, and guessing their
needs will be really hard for me.  I just want more concise syntax, better
types, access to a nice REPL, and access to a much more sophisticated yet
compact MR pipelining DSL.  For this, scala + scalding serves admirably.



On Sat, Jul 27, 2013 at 6:10 AM, Dmitriy Lyubimov <[email protected]> wrote:

> On Jul 26, 2013 11:56 PM, "Nick Pentreath" <[email protected]>
> wrote:
> >
> > Thanks for the update on that PR I will definitely take a look.
> >
> >
> > I wonder if they will run into the exact same Colt issues as mahout did?!
>
> Yes i wondered that too since the day i saw spark als example.
>
> Jblas is far better choice but as Sebastian has demonstrated bona fide
> improvements are hard to achieve due to high jni costs, so i would actually
> have a specific type of matrix to solve specific probems when needed rather
> than sweepingly generalize it as a dense vector or matrix support.
>
> Aside from that, it seems lapack backend is running up to 5x slower on amd
> hardware that our company unfortunately chose to invest in... argh!..
>
> >
> >
> > This DSL looks great, I'm gonna play around with it as soon as I get a
> chance.
> >
> >
> >
> > One question - breeze has quite a similar syntax that is a bit simpler in
> some ways - basically * for matrix multiply and :* for elementwise. Would
> something similar work here?
>
> As i commented before, it just caters to R syntax, along with bunch of
> other things. If we beleive that there is a reason to inherit syntax vs
> devising something new, then there are really few candidates, and i dont
> think Breeze is going to cut it based on adoption level.
>
> In particular, in my company it is hard to convince R users to start using
> scala or java as it is, so I am just scoring points here by making it look
> familiar to them.
>
> Also i want to reserve the colon to command associativity of operation, as
> scala means it, which is important for optimizing non commutative
> operations such as elementwise division or matrix multiplication. E.g.
> there are significant peroformance differences between saying
>
>
Maybe I should step out of the discussion where it dives into what
operators we use, because frankly, I probably won't use them much,
*especially* if there is too much magical associativity rules I have to
remember - I *hate* stuff like:



> A %*% diagonal === A.times(diagonal)
>
> And
>
> A %*%: diagonal === diagonal.timesLeft(A).
>

In particular, pretty much whenever we're going to be doing a map-reduce
job in a method call (for the distributed case), being terribly clever in
our syntax is going to bite us, because people (esp. typical R users, who
aren't super performance focused) will be doing stuff like "(A.t %*% B).t -
(A.t %*% A)" without thinking whether this can be reorganized at all to
reduce the number of map-reduce passes.  Maybe that's ok, but they're going
to super-complain on the list all the time if we give them too much rope to
hang themselves with.

But yeah, maybe we'll just be looking at two different focuses on this: I
really care more about writing nicer MR pipelines for our jobs (I've
already played around with a nice replacement for seq2sparse in a single
small scalding job with modular components, it's about 1/10th the number of
lines of our current one, with most of the functionality), and getting a
nice integrated REPL for playing with the results.

And maybe getting R (and matlab) users to use our stuff is a good thing,
even if it means them hanging themselves a bit.  Heh.


> Obviously the latter is n flops and the former is n squared.
>
> I dont think breeze made a wise decision by putting a special functional
> meaning into :  . It is reserved for associativity in scala.
>
> >
> >
> > Would be quite nice to have same syntax but different backends that are
> swappable ;)
> > —
> > Sent from Mailbox for iPhone
> >
> > On Sat, Jul 27, 2013 at 2:42 AM, Dmitriy Lyubimov <[email protected]>
> > wrote:
> >
> > > coincidentally, spark mlib just posted a pull request intended to add
> > > support for dense and sparse vectors, looks quite similar.
> > > https://github.com/mesos/spark/pull/736. They seem to choose JBlas
> backing
> > > for dense stuff (although at a vector level there's probably not much
> > > reason to) and as-is Colt for sparse stuff.
> > > On Fri, Jul 26, 2013 at 5:20 PM, Dmitriy Lyubimov <[email protected]>
> wrote:
> > >>
> > >>
> > >>
> > >> On Fri, Jul 26, 2013 at 5:07 AM, Ted Dunning <[email protected]
> >wrote:
> > >>
> > >>> This sounds great in principle.  I haven't seen any details yet
> (haven't
> > >>> had time to look).
> > >>>
> > >>> Is there a strong reason to go with the R syntax for multiplication
> > >>> instead
> > >>> of the matlab convention that a*b means a.times(b)?
> > >>>
> > >>
> > >> As discussed, but also because matlab style elementwise operators are
> > >> impossible to keep at proper precedence level in scala. It kind of has
> to
> > >> start with either '*' or '%' to keep proper precedence, '.*' will not
> work
> > >> unfortunately. And mix along the lines "some of Matlab, some of
> perhaps
> > >> completely something else' does not seem appealing at all.
> > >>
> > >>
>



-- 

  -jake

Re: Proposal: scala DSL module for Mahout linear algebra.

Reply via email to