That would be nice, but it is based on my personal and unpublished evaluation based on personal use. This isn't a formal evaluation.
We should encourage the 0xdata team to show us what they can do. On Thu, May 1, 2014 at 1:25 AM, Dmitriy Lyubimov <[email protected]> wrote: > sure. I assume this should include statements that something crushes > something without providing a link to a published analysis of what it is > something that crushes something another and due to what something. > > > On Wed, Apr 30, 2014 at 4:16 PM, Ted Dunning <[email protected]> > wrote: > > > It seems to me that Sebastian and Ellen have hit on the right tack. > > > > Let's get back to work making something cool here. Let's build this > > community up instead of having endlessly divisive discussions. > > > > Let's get back to the Apache emphasis on do-acracy. > > > > > > > > On Wed, Apr 30, 2014 at 11:36 AM, Ellen Friedman < > > [email protected] > > > wrote: > > > > > I am weighing in here on issues of great concern but non-technical. > > > > > > 1. One of the great things about Mahout is the community – not an easy > > > thing to have achieved given that people are dispersed geographically > > > and there is no single focus or company backing the project. In short, > > > the people who make Mahout are doing something cool. > > > > > > Suggestions to try to break it into different groups, Mahout-Spark and > > > Mahout2o, run counter to this success. Why fragment it at exactly the > > > moment when new contributors (from 0xdata) are coming forward ? The > > > spirit of this project has been inclusive. Let's not change that now. > > > > > > 2. Sebastian pointed out: > > > > > > "We agreed to give the h2O guys a shot for exploration of a possible > > > integration into Mahout. We should be grateful that they are investing > > > a lot of time into this, and should help whereever we can. Once they > > > come up with a concrete proposal or patch, we will have a look at it, > > > have a deep, technical and polite discussion, and make a decision > > > afterwards." > > > > > > +1 > > > > > > We agreed to explore the h2o option. Why use of lots of time and > > > energy in re-visiting and second guessing that decision? Let it go > > > forward, likely some great things will emerge for Mahout, and if not, > > > then we say "thank you" to h2o contributors for giving it a try. > > > > > > As the guys from h2o are adding new resources to do this development, > > > it is not really detracting anything from Mahout's resources except > > > when someone opens one of these discussions that lead to fragmentation > > > and distraction. I'm not a coder and not as technical as any of you, > > > but from my view It seems to be the talk and not the development that > > > is distracting. > > > > > > 3. Over the last year, there has been growing and widespread interest > > > in Mahout from the outside world, and now, with the new changes to > > > support Scala, Spark and h2o (possibly Stratosphere later) the growing > > > interest has turned into excitement. This is a great time for the > > > project – tons of effort but moving toward a big result. > > > > > > Users will have some excellent new choices, all parts of Mahout will > > > benefit. And if in the future it is seen that some of the new features > > > are not being widely or successfully used, they will be deprecated, as > > > was done during the big clean-up of the 0.8 release. New choices, new > > > ways to use Mahout, new people getting involved – this is excellent. > > > > > > 4. My thought is, stick together, embrace change, welcome new comers > > > and be very proud to be building the new Mahout. > > > > > > > > > > > > On 4/29/14, Sebastian Schelter <[email protected]> wrote: > > > > For reasons of transparency in this discussion, I should add that I > am > > a > > > > committer on the upcoming Stratosphere ASF podling, co-worker of the > > > > main developers and have contributed to it as part of my PhD. > > > > > > > > On 04/29/2014 09:23 PM, Sebastian Schelter wrote: > > > >> Anand, > > > >> > > > >> I'm trying to answer some of your questions, and my answers > highlight > > > >> the points that I would like to see clarified about h20. > > > >> > > > >> On 04/28/2014 11:13 PM, Anand Avati wrote: > > > >> > > > >>> 1. Why is the DSL claiming to have (in its vision) logical vs > > physical > > > >>> separation if not for providing multiple compute backends? > > > >> > > > >> This is not a claim or a vision, the DSL already has this > separation. > > > >> Take for example o.a.m.sparkbindings.drm.plan.OpAtA, thats the > logical > > > >> operator for executing a Transpose-Times-Self matrix multiplication. > > In > > > >> o.a.m.sparkbindings.blas.AtA you will find two physical operator > > > >> implementations for that. The choice which one to use depends on > > whether > > > >> there is enough memory to hold certain intermediary results in > memory. > > > >> > > > >> The primary intention of a separation into logical and physical > > > >> operators is to allow for a declarative programming style on the > users > > > >> side and for an optimizer on the system side which automatically > > chooses > > > >> the optimal physical operator for the execution of a specific > program. > > > >> > > > >> This choice of the physical operator might depend on the shape and > > > >> amount of the data processed as well on the underlying available > > > >> resources. *The separation into logical and physical operators > clearly > > > >> doesn't imply to have multiple backends*. It only makes it very easy > > to > > > >> support them. > > > >> > > > >>> > > > >>> 2. Does the proposal of having a new DSL backend in the future (for > > e.g > > > >>> stratosphere as suggested elsewhere) make you: > > > >> > > > >>> -- worry that stratosphere would be a dependency to Mahout? > > > >> > > > >> Stratosphere has been accepted as a incubator project in the ASF > > > >> recently, so the worry about such a dependency is naturally less > than > > > >> about an externally managed project like h20. > > > >> > > > >>> -- worry that as a user/commiter/contributor you have to worry > about > > a > > > >>> new > > > >>> framework? > > > >> > > > >> In my eyes, there is a big difference between Spark/Stratosphere and > > > >> h20. Spark and Stratosphere have a clearly defined programming and > > > >> execution model. They execute programs that are composed of a DAG of > > > >> operators. The set of operators has clearly defined semantics and > > > >> parallelization strategies. If you compare their operators, you will > > > >> find that they offer pretty much the same in lightly different > > flavors. > > > >> For both, there are scientific papers that in detail explain all > these > > > >> things. > > > >> > > > >> I have asked about a detailed description of h20's programming model > > and > > > >> execution model and I searched the documentation, but I haven't been > > > >> able to find something that clearly describes how things are done. I > > > >> would love to read up on this, but until I'm presented with this, I > > have > > > >> to assume that such a principled foundation is missing. > > > >> > > > >> > > > >> --sebastian > > > >> > > > > > > > > > > > > > >
