Anand,

I'm trying to answer some of your questions, and my answers highlight the points that I would like to see clarified about h20.

On 04/28/2014 11:13 PM, Anand Avati wrote:

1. Why is the DSL claiming to have (in its vision) logical vs physical
separation if not for providing multiple compute backends?

This is not a claim or a vision, the DSL already has this separation. Take for example o.a.m.sparkbindings.drm.plan.OpAtA, thats the logical operator for executing a Transpose-Times-Self matrix multiplication. In o.a.m.sparkbindings.blas.AtA you will find two physical operator implementations for that. The choice which one to use depends on whether there is enough memory to hold certain intermediary results in memory.

The primary intention of a separation into logical and physical operators is to allow for a declarative programming style on the users side and for an optimizer on the system side which automatically chooses the optimal physical operator for the execution of a specific program.

This choice of the physical operator might depend on the shape and amount of the data processed as well on the underlying available resources. *The separation into logical and physical operators clearly doesn't imply to have multiple backends*. It only makes it very easy to support them.


2. Does the proposal of having a new DSL backend in the future (for e.g
stratosphere as suggested elsewhere) make you:

-- worry that stratosphere would be a dependency to Mahout?

Stratosphere has been accepted as a incubator project in the ASF recently, so the worry about such a dependency is naturally less than about an externally managed project like h20.

-- worry that as a user/commiter/contributor you have to worry about a new
framework?

In my eyes, there is a big difference between Spark/Stratosphere and h20. Spark and Stratosphere have a clearly defined programming and execution model. They execute programs that are composed of a DAG of operators. The set of operators has clearly defined semantics and parallelization strategies. If you compare their operators, you will find that they offer pretty much the same in lightly different flavors. For both, there are scientific papers that in detail explain all these things.

I have asked about a detailed description of h20's programming model and execution model and I searched the documentation, but I haven't been able to find something that clearly describes how things are done. I would love to read up on this, but until I'm presented with this, I have to assume that such a principled foundation is missing.


--sebastian

Reply via email to