Re: H2O integration - completion and review

Dmitriy Lyubimov Fri, 11 Jul 2014 14:01:36 -0700

The concern would be if it creates fragmentation for the project. as in
say, a collection of semi-consistent things (Mahout 1.0).

But there's more commonality in that effort than it isn't. E.g. there are
pure algebraic algorithms in math-scala that one can run to compare how
they would be behave in both cases. (for most part, i assume it is a
function of in-core algebra though).

And most importantly, as i always was saying, the main benefit to me is not
that there's a CF algorithm in Mahout, but that i can write a custom one of
my own with less effort than just writing directly to any given engine api.
That part is very conceptual (drm api-dependent) and would be common
regardless of my deployment infrastructure. In other words, everyone can
just write their own co-occurrence analysis version hopefully easier than
just writing it directly for Spark or directly to H20 if he/she wanted to.
that's the real story (at least in my talk).

On Fri, Jul 11, 2014 at 1:50 PM, Pat Ferrel <[email protected]> wrote:

> So given what Dmitriy said and Anand's answers below why are we trying to
> merge this into Mahout?
>
> Why don’t you Anand/oxdata create Mahout2O and take what has been
> delivered. We create Mahout v2 for Spark. We keep Spark specific code in
> the spark module. You get the algebraic DSL and all the java math stuff.
> You get all the future solver optimizations and anything you want.
>
> The benefit of keeping things separate for now are that Mahout2O only has
> to deal with h2o duplicated code and Mahout v2 only has to deal with Spark.
> You deal with h2o optimization and we deal with spark. You know the old
> saying that it’s not 2x it’s x^2.
>
> This seems so obvious. If in the future we work out the impedance
> mismatch, IO engine neutrality, etc then we talk again of merge.
>
> For that matter I’d be interested in seeing how to make ItemSimilarity
> work on Mahout2O. A performance comparison would help push this one way or
> the other.
>
> On Jul 11, 2014, at 1:36 PM, Dmitriy Lyubimov <[email protected]> wrote:
>
> On Fri, Jul 11, 2014 at 1:28 PM, Anand Avati <[email protected]> wrote:
>
> >
> > c) abandon the pretense/goal that Mahout aims to be backend independent
> and
> > admit/become Spark specific.
> >
> >
> For the record, there has never been said the work is backend-independent
> for _anything_. The claim has always been much more modest. It was said it
> was backend-independent for R-(matlab)-like algebraic expressions. which it
> is. And it is not even main side of the story.
>
> I suspect there are couple of more areas in general math beyond "base" R
> algebra where abstractions can also be built and be useful and be engine
> independent.
>
> Like i said, the true solution is probably ports of non-algebraic portions
> of quasi-algebraic solution (i.e. b + doing something h2o specific for that
> work if desired). Smart componentization of concerns may (or may not) go a
> long way here (just like in tests).
>
>

Re: H2O integration - completion and review

Reply via email to