On Sat, Jul 27, 2013 at 9:40 AM, Dmitriy Lyubimov <[email protected]> wrote:
> Jake, this is in-core. I work on similar expressiveness for spark backed > DRMs and there are indeed different set of algorithms there and naive > combinations are not necessarily producing the best outcome. There's no > doubt MR stuff will need amended set of operations and primitives. > > As far as associativity is concerned, this is just scala . One cannot > implement elementwise 5-x as 5:-x or 5-x on a sealed left-hand argument, > such as the language rules and no amount of discussion on our side can > change that. You can only do 5 -: x . > Can you show me some examples of where I'd *want* to do the "wrong thing" from an associativity standpoint? "5 - x" where x is a vector, is kinda weird. But maybe you're subtracting off a mean or something, but then I'd probably write this as "- (x - 5)", because I always associate left to right. :) > and putting completely different functional meaning into :* and * will > confuse scala users to no end who got used to things like :/ and /: . This > all needs striking a subtle balance unfortunately. > Ok, then like I said, maybe I'll just defer to your judgement on the operator syntax, as I've *never* gotten used to the scala :/ and /: uses. I prefer method calls to method calls masquerading as native operators. Maybe I should Stop Being Afraid and Learn to Love the DSL, but I'm not quite there yet: Too Much Magic. :) > > as i said before, i am not hung on %*% syntax, but i don't think doing :* > or .* for elementwise would work on scala. > How often do we really do elementwise matrix operations? Is this really a thing we often want to worry about? addition and subtraction, sure, but that's the full matrix operation too. Ditto for multiplication or division *by scalars*, but Hadamard products on matrices? I guess it _happens_, but I'm not sure I've ever done it, or if I have, it's pretty darn rare. > > > > On Sat, Jul 27, 2013 at 8:00 AM, Jake Mannix <[email protected]> > wrote: > > > I think my main concern is one of readability and hidden information: I > > really _don't_ like having to know _anything_ about associativity rules, > > and I'm not sure that catering to R users (*or* matlab users) is what we > > want to do. Maybe I'm thinking in a different direction with my scala > > (+scalding) interop work, but I really am not aiming for some totally > > fluent API for non-programmer analysts. I'm not one, and guessing their > > needs will be really hard for me. I just want more concise syntax, > better > > types, access to a nice REPL, and access to a much more sophisticated yet > > compact MR pipelining DSL. For this, scala + scalding serves admirably. > > > > > > On Sat, Jul 27, 2013 at 6:10 AM, Dmitriy Lyubimov <[email protected]> > > wrote: > > > > > On Jul 26, 2013 11:56 PM, "Nick Pentreath" <[email protected]> > > > wrote: > > > > > > > > Thanks for the update on that PR I will definitely take a look. > > > > > > > > > > > > I wonder if they will run into the exact same Colt issues as mahout > > did?! > > > > > > Yes i wondered that too since the day i saw spark als example. > > > > > > Jblas is far better choice but as Sebastian has demonstrated bona fide > > > improvements are hard to achieve due to high jni costs, so i would > > actually > > > have a specific type of matrix to solve specific probems when needed > > rather > > > than sweepingly generalize it as a dense vector or matrix support. > > > > > > Aside from that, it seems lapack backend is running up to 5x slower on > > amd > > > hardware that our company unfortunately chose to invest in... argh!.. > > > > > > > > > > > > > > > This DSL looks great, I'm gonna play around with it as soon as I get > a > > > chance. > > > > > > > > > > > > > > > > One question - breeze has quite a similar syntax that is a bit > simpler > > in > > > some ways - basically * for matrix multiply and :* for elementwise. > Would > > > something similar work here? > > > > > > As i commented before, it just caters to R syntax, along with bunch of > > > other things. If we beleive that there is a reason to inherit syntax vs > > > devising something new, then there are really few candidates, and i > dont > > > think Breeze is going to cut it based on adoption level. > > > > > > In particular, in my company it is hard to convince R users to start > > using > > > scala or java as it is, so I am just scoring points here by making it > > look > > > familiar to them. > > > > > > Also i want to reserve the colon to command associativity of operation, > > as > > > scala means it, which is important for optimizing non commutative > > > operations such as elementwise division or matrix multiplication. E.g. > > > there are significant peroformance differences between saying > > > > > > > > Maybe I should step out of the discussion where it dives into what > > operators we use, because frankly, I probably won't use them much, > > *especially* if there is too much magical associativity rules I have to > > remember - I *hate* stuff like: > > > > > > > > > A %*% diagonal === A.times(diagonal) > > > > > > And > > > > > > A %*%: diagonal === diagonal.timesLeft(A). > > > > > > > In particular, pretty much whenever we're going to be doing a map-reduce > > job in a method call (for the distributed case), being terribly clever in > > our syntax is going to bite us, because people (esp. typical R users, who > > aren't super performance focused) will be doing stuff like "(A.t %*% > B).t - > > (A.t %*% A)" without thinking whether this can be reorganized at all to > > reduce the number of map-reduce passes. Maybe that's ok, but they're > going > > to super-complain on the list all the time if we give them too much rope > to > > hang themselves with. > > > > But yeah, maybe we'll just be looking at two different focuses on this: I > > really care more about writing nicer MR pipelines for our jobs (I've > > already played around with a nice replacement for seq2sparse in a single > > small scalding job with modular components, it's about 1/10th the number > of > > lines of our current one, with most of the functionality), and getting a > > nice integrated REPL for playing with the results. > > > > And maybe getting R (and matlab) users to use our stuff is a good thing, > > even if it means them hanging themselves a bit. Heh. > > > > > > > Obviously the latter is n flops and the former is n squared. > > > > > > I dont think breeze made a wise decision by putting a special > functional > > > meaning into : . It is reserved for associativity in scala. > > > > > > > > > > > > > > > Would be quite nice to have same syntax but different backends that > are > > > swappable ;) > > > > — > > > > Sent from Mailbox for iPhone > > > > > > > > On Sat, Jul 27, 2013 at 2:42 AM, Dmitriy Lyubimov <[email protected] > > > > > > wrote: > > > > > > > > > coincidentally, spark mlib just posted a pull request intended to > add > > > > > support for dense and sparse vectors, looks quite similar. > > > > > https://github.com/mesos/spark/pull/736. They seem to choose JBlas > > > backing > > > > > for dense stuff (although at a vector level there's probably not > much > > > > > reason to) and as-is Colt for sparse stuff. > > > > > On Fri, Jul 26, 2013 at 5:20 PM, Dmitriy Lyubimov < > [email protected] > > > > > > wrote: > > > > >> > > > > >> > > > > >> > > > > >> On Fri, Jul 26, 2013 at 5:07 AM, Ted Dunning < > [email protected] > > > >wrote: > > > > >> > > > > >>> This sounds great in principle. I haven't seen any details yet > > > (haven't > > > > >>> had time to look). > > > > >>> > > > > >>> Is there a strong reason to go with the R syntax for > multiplication > > > > >>> instead > > > > >>> of the matlab convention that a*b means a.times(b)? > > > > >>> > > > > >> > > > > >> As discussed, but also because matlab style elementwise operators > > are > > > > >> impossible to keep at proper precedence level in scala. It kind of > > has > > > to > > > > >> start with either '*' or '%' to keep proper precedence, '.*' will > > not > > > work > > > > >> unfortunately. And mix along the lines "some of Matlab, some of > > > perhaps > > > > >> completely something else' does not seem appealing at all. > > > > >> > > > > >> > > > > > > > > > > > -- > > > > -jake > > > -- -jake
