So given what Dmitriy said and Anand's answers below why are we trying to merge 
this into Mahout?

Why don’t you Anand/oxdata create Mahout2O and take what has been delivered. We 
create Mahout v2 for Spark. We keep Spark specific code in the spark module. 
You get the algebraic DSL and all the java math stuff. You get all the future 
solver optimizations and anything you want.

The benefit of keeping things separate for now are that Mahout2O only has to 
deal with h2o duplicated code and Mahout v2 only has to deal with Spark. You 
deal with h2o optimization and we deal with spark. You know the old saying that 
it’s not 2x it’s x^2.

This seems so obvious. If in the future we work out the impedance mismatch, IO 
engine neutrality, etc then we talk again of merge. 

For that matter I’d be interested in seeing how to make ItemSimilarity work on 
Mahout2O. A performance comparison would help push this one way or the other.

On Jul 11, 2014, at 1:36 PM, Dmitriy Lyubimov <[email protected]> wrote:

On Fri, Jul 11, 2014 at 1:28 PM, Anand Avati <[email protected]> wrote:

> 
> c) abandon the pretense/goal that Mahout aims to be backend independent and
> admit/become Spark specific.
> 
> 
For the record, there has never been said the work is backend-independent
for _anything_. The claim has always been much more modest. It was said it
was backend-independent for R-(matlab)-like algebraic expressions. which it
is. And it is not even main side of the story.

I suspect there are couple of more areas in general math beyond "base" R
algebra where abstractions can also be built and be useful and be engine
independent.

Like i said, the true solution is probably ports of non-algebraic portions
of quasi-algebraic solution (i.e. b + doing something h2o specific for that
work if desired). Smart componentization of concerns may (or may not) go a
long way here (just like in tests).

Reply via email to