So given what Dmitriy said and Anand's answers below why are we trying to merge this into Mahout?
Why don’t you Anand/oxdata create Mahout2O and take what has been delivered. We create Mahout v2 for Spark. We keep Spark specific code in the spark module. You get the algebraic DSL and all the java math stuff. You get all the future solver optimizations and anything you want. The benefit of keeping things separate for now are that Mahout2O only has to deal with h2o duplicated code and Mahout v2 only has to deal with Spark. You deal with h2o optimization and we deal with spark. You know the old saying that it’s not 2x it’s x^2. This seems so obvious. If in the future we work out the impedance mismatch, IO engine neutrality, etc then we talk again of merge. For that matter I’d be interested in seeing how to make ItemSimilarity work on Mahout2O. A performance comparison would help push this one way or the other. On Jul 11, 2014, at 1:36 PM, Dmitriy Lyubimov <[email protected]> wrote: On Fri, Jul 11, 2014 at 1:28 PM, Anand Avati <[email protected]> wrote: > > c) abandon the pretense/goal that Mahout aims to be backend independent and > admit/become Spark specific. > > For the record, there has never been said the work is backend-independent for _anything_. The claim has always been much more modest. It was said it was backend-independent for R-(matlab)-like algebraic expressions. which it is. And it is not even main side of the story. I suspect there are couple of more areas in general math beyond "base" R algebra where abstractions can also be built and be useful and be engine independent. Like i said, the true solution is probably ports of non-algebraic portions of quasi-algebraic solution (i.e. b + doing something h2o specific for that work if desired). Smart componentization of concerns may (or may not) go a long way here (just like in tests).
