Re: [jira] [Commented] (MAHOUT-1500) H2O integration

Cliff Click Mon, 28 Apr 2014 09:33:08 -0700

At the risk stepping into a maelstrom where I don't belong, let meanswer some of these:


On 4/27/2014 2:42 PM, Dmitriy Lyubimov (JIRA) wrote:

*(C)*
...because x2o programming model is not rich enough to provide things like 
zipping identically distributed datasets,

We do, and it's "free" - a pointer-copy only. Distribution in H2O iscalled a "VectorGroup", and 2 Vecs in the same VectorGroup will haveequal distribution. Zipping them is as easy as: "new Vec[]{vec1,vec2}".

very general shuffle model (e.g. many-to-many shuffle),

Again, we do - although by it's very design this operation mode isexpensive for everybody - it implies at least O(n) generalcommunication, sometimes O(n^2).We try to provide tools to allow people to avoid general shuffles, butif they want it - it's easily available.

advanced partition management (shuffless resplit-coalesce), and so on.

We are trying very hard to do good partition management "under the hood"and never expose it.If we have to expose the partitions, then I think this is a sign of abroken API - although I'm willing to be convinced otherwise given a somecases where a user-rolled partition hack beats what we're doing "underthe hood". So far we've seen only one such case (forced random shuffleof chunk-size granularity), and we're folding it back into the basic engine.

I am not even sure if there's a clear concept of combiner type operation.

If by "combiner type" operation, you mean what Mahout calls"aggregations" - then of course yes we totally support aggregations -our Map/Reduce paradigm is exactly a aggregation implemented withgeneric Java code.


Cliff

Re: [jira] [Commented] (MAHOUT-1500) H2O integration

Reply via email to