The concern would be if it creates fragmentation for the project. as in say, a collection of semi-consistent things (Mahout 1.0).
But there's more commonality in that effort than it isn't. E.g. there are pure algebraic algorithms in math-scala that one can run to compare how they would be behave in both cases. (for most part, i assume it is a function of in-core algebra though). And most importantly, as i always was saying, the main benefit to me is not that there's a CF algorithm in Mahout, but that i can write a custom one of my own with less effort than just writing directly to any given engine api. That part is very conceptual (drm api-dependent) and would be common regardless of my deployment infrastructure. In other words, everyone can just write their own co-occurrence analysis version hopefully easier than just writing it directly for Spark or directly to H20 if he/she wanted to. that's the real story (at least in my talk). On Fri, Jul 11, 2014 at 1:50 PM, Pat Ferrel <[email protected]> wrote: > So given what Dmitriy said and Anand's answers below why are we trying to > merge this into Mahout? > > Why don’t you Anand/oxdata create Mahout2O and take what has been > delivered. We create Mahout v2 for Spark. We keep Spark specific code in > the spark module. You get the algebraic DSL and all the java math stuff. > You get all the future solver optimizations and anything you want. > > The benefit of keeping things separate for now are that Mahout2O only has > to deal with h2o duplicated code and Mahout v2 only has to deal with Spark. > You deal with h2o optimization and we deal with spark. You know the old > saying that it’s not 2x it’s x^2. > > This seems so obvious. If in the future we work out the impedance > mismatch, IO engine neutrality, etc then we talk again of merge. > > For that matter I’d be interested in seeing how to make ItemSimilarity > work on Mahout2O. A performance comparison would help push this one way or > the other. > > On Jul 11, 2014, at 1:36 PM, Dmitriy Lyubimov <[email protected]> wrote: > > On Fri, Jul 11, 2014 at 1:28 PM, Anand Avati <[email protected]> wrote: > > > > > c) abandon the pretense/goal that Mahout aims to be backend independent > and > > admit/become Spark specific. > > > > > For the record, there has never been said the work is backend-independent > for _anything_. The claim has always been much more modest. It was said it > was backend-independent for R-(matlab)-like algebraic expressions. which it > is. And it is not even main side of the story. > > I suspect there are couple of more areas in general math beyond "base" R > algebra where abstractions can also be built and be useful and be engine > independent. > > Like i said, the true solution is probably ports of non-algebraic portions > of quasi-algebraic solution (i.e. b + doing something h2o specific for that > work if desired). Smart componentization of concerns may (or may not) go a > long way here (just like in tests). > >
