At the risk stepping into a maelstrom where I don't belong, let me answer some of these:

On 4/27/2014 2:42 PM, Dmitriy Lyubimov (JIRA) wrote:
*(C)*
...because x2o programming model is not rich enough to provide things like 
zipping identically distributed datasets,
We do, and it's "free" - a pointer-copy only. Distribution in H2O is called a "VectorGroup", and 2 Vecs in the same VectorGroup will have equal distribution. Zipping them is as easy as: "new Vec[]{vec1,vec2}".


very general shuffle model (e.g. many-to-many shuffle),
Again, we do - although by it's very design this operation mode is expensive for everybody - it implies at least O(n) general communication, sometimes O(n^2). We try to provide tools to allow people to avoid general shuffles, but if they want it - it's easily available.


advanced partition management (shuffless resplit-coalesce), and so on.
We are trying very hard to do good partition management "under the hood" and never expose it. If we have to expose the partitions, then I think this is a sign of a broken API - although I'm willing to be convinced otherwise given a some cases where a user-rolled partition hack beats what we're doing "under the hood". So far we've seen only one such case (forced random shuffle of chunk-size granularity), and we're folding it back into the basic engine.


I am not even sure if there's a clear concept of combiner type operation.
If by "combiner type" operation, you mean what Mahout calls "aggregations" - then of course yes we totally support aggregations - our Map/Reduce paradigm is exactly a aggregation implemented with generic Java code.

Cliff

Reply via email to