Re: Straw poll re: H2O ?

Sebastian Schelter Tue, 29 Apr 2014 12:24:26 -0700

Anand,

I'm trying to answer some of your questions, and my answers highlightthe points that I would like to see clarified about h20.


On 04/28/2014 11:13 PM, Anand Avati wrote:

1. Why is the DSL claiming to have (in its vision) logical vs physical
separation if not for providing multiple compute backends?

This is not a claim or a vision, the DSL already has this separation.Take for example o.a.m.sparkbindings.drm.plan.OpAtA, thats the logicaloperator for executing a Transpose-Times-Self matrix multiplication. Ino.a.m.sparkbindings.blas.AtA you will find two physical operatorimplementations for that. The choice which one to use depends on whetherthere is enough memory to hold certain intermediary results in memory.

The primary intention of a separation into logical and physicaloperators is to allow for a declarative programming style on the usersside and for an optimizer on the system side which automatically choosesthe optimal physical operator for the execution of a specific program.

This choice of the physical operator might depend on the shape andamount of the data processed as well on the underlying availableresources. *The separation into logical and physical operators clearlydoesn't imply to have multiple backends*. It only makes it very easy tosupport them.


2. Does the proposal of having a new DSL backend in the future (for e.g
stratosphere as suggested elsewhere) make you:

-- worry that stratosphere would be a dependency to Mahout?

Stratosphere has been accepted as a incubator project in the ASFrecently, so the worry about such a dependency is naturally less thanabout an externally managed project like h20.

-- worry that as a user/commiter/contributor you have to worry about a new
framework?

In my eyes, there is a big difference between Spark/Stratosphere andh20. Spark and Stratosphere have a clearly defined programming andexecution model. They execute programs that are composed of a DAG ofoperators. The set of operators has clearly defined semantics andparallelization strategies. If you compare their operators, you willfind that they offer pretty much the same in lightly different flavors.For both, there are scientific papers that in detail explain all thesethings.

I have asked about a detailed description of h20's programming model andexecution model and I searched the documentation, but I haven't beenable to find something that clearly describes how things are done. Iwould love to read up on this, but until I'm presented with this, I haveto assume that such a principled foundation is missing.



--sebastian

Re: Straw poll re: H2O ?

Reply via email to