+1. I don't think anyone said anything, privately or publicly, about h20 integration being a bad idea. It's just there's more than one way to do it, so debate is focusing on exploration of pluses and minuses of each individual proposal (as they come to light). Part of difficulty here was that the expertise intersection of all parts being connected and integrated has been pretty poor on individual basis. So we have to go by scenarios where a group of specialized experts tries to figure out the solution.
w.r.t to incubation proposals, it seems dubious for a number reasons. Reason 1 is that these projects are the primary factor moving Mahout anywhere forward. Without them, given "bye-bye mapreduce" jira, there's frankly not much left in Mahout, so it is reflection of more or less common opinion that the project would just spiral down on its own if the things stay status-quo. Reason 2 is that there are good (not irreplaceable, but good) components in Mahout that these efforts depend on. Therefore, incubation would be faced with a perspective of having dependencies on project that on its own is winding down. Not good for incubation side. Reason 3 is that current effort is (IMO) minimalistic enough not to warrant a new project. It simply doesn't, and can't have the scale of things like Spark or Hadoop eco. There would be just not enough substance for a new project at this point. I don't feel very strong about this point though. On Mon, Apr 28, 2014 at 11:09 AM, Sebastian Schelter <[email protected]> wrote: > We all should calm down here and remind ourselves why we are doing this > whole thing: Because we love open source and want to have a vibrant > community and a great piece of software. > > Mahout has come a long way and is at a crossroads right now, so its only > natural that there are heated discussions. But, we should immediately stop > the fingerpointing and related stuff, we have managed to avoid this since > Mahout's inception and we should continue to do so. > > The best way to help Mahout is to pick up some of the work that needs to > be done with regards to documentation, examples, Hadoop 2 compatibility and > designing the future, especially with regards to dataframes e.g. > > We agreed to give the h2O guys a shot for exploration of a possible > integration into Mahout. We should be grateful that they are investing a > lot of time into this, and should help whereever we can. Once they come up > with a concrete proposal or patch, we will have a look at it, have a deep, > technical and polite discussion, and make a decision afterwards. > > --sebastian > > > > > On 04/28/2014 07:42 PM, Anand Avati wrote: > >> On Mon, Apr 28, 2014 at 2:18 AM, Sean Owen <[email protected]> wrote: >> >> On Mon, Apr 28, 2014 at 3:39 AM, Dmitriy Lyubimov (JIRA) >>> <[email protected]> wrote: >>> >>>> bq. The emotional tenor of Dmitriy Lyubimov's comments are exactly what >>>> >>> is encouraging the h2o work to be done a bit apart. It simply isn't >>> efficient to have to answer so many off-topic points whenever any reports >>> on work in progress are given. >>> >>>> >>>> I think this has been the off-topic here. >>>> >>>> Calling my comments "emotional" or "non-technical", or _loosely_ >>>> >>> paraphrasing me. >>> >>> Yes, the personal finger-pointing parts don't belong and don't >>> convince anyone, let's skip those. >>> >>> >> +1. Let's skip those. >> >> >> From the sidelines, I see a bunch of work intended for Mahout >> >>> proceeding outside the community such as it is, and even Apache. Of >>> course, contributions are always prepped externally to some degree. I >>> create, debug, change patches before posting them, maybe checking in >>> early on choices that others may want input on. >>> >>> This is a large-ish change being proposed, IIUC. I can see one person >>> who publicly, and at least two who privately, have clear reservations >>> about this direction. >>> >> >> >> It will probably be a large-ish change, indeed. But my personal take is >> that, non-technical aspects of the debate is unfortunately taking >> precedence over real technical parts. Please refer to email thread "Mahout >> DSL vs Spark". >> >> >> >> It certainly appears funny vis-a-vis the "Apache >>> way" to work on a contribution *because* one (or more) other >>> committers aren't convinced. >>> >>> >> As mentioned in the referred email thread, a lot of the technical issues >> which got addressed in the work which was carried out outside of Apache, >> was really sorting out and highlighting build and classloader related >> challenges on the H2O side. There was little motivation to carry out those >> discussions on the Mahout lists as it was really ~99% H2O specific >> discussions and noise/spam to the Mahout community. >> >> I don't think that's important to dither about. What is, is this: if a >> >>> big-bang patch landed tomorrow, I wonder if it would pass a VOTE? >>> Nobody can pre-judge his/her opinion on a proposal that's not tabled >>> yet, but it seems like a quite possible outcome. >>> >>> >> As an outsider, my opinion is that the proposed need for a VOTE is a >> largely masqueraded problem built around the perception of disagreement >> over something vague, abstract and inaccurate. And therefore premature. >> That being said the PMC may vote on any issues/non-issues it may please. >> >> Would be a shame to do a lot of work, intending it for a commit, and >> >>> then find there is not consensus. >>> >>> >> Exactly the kind of inaccurate perception I meant. While we are (at least >> I >> am) exploring the best fit model for integration, and exploration by >> definition involves taking potentially wrong steps and backtracking if >> necessary, the perception unfortunately seems to be that the proposed >> intermediate (potentially wrong) steps are some kind of pre-decided plan >> of >> action. So, no, there WOULDN'T be a lot of work intended for a commit >> against consensus. >> >> So is it better to figure out earlier than later whether these 2+ >> >>> parallel tracks have enough commonality to coexist? >>> >> >> >> Whether two parallel tracks (I assume the spark track and the H2O track?) >> have enough commonality to exist - one way you surely cannot get the right >> answer for this (except by co-incidence) is by taking a vote from a group >> who are experts in only either one of those tracks. From what I see, most >> of the opposition has been due to a combination of lack of understanding >> of >> H2O and (welcome) skepticism. If, as a contributor, I find there is no >> natural or beneficial way to co-exist with Spark, I wouldn't waste my time >> writing code, and for sure am not dependent on another group's vote to >> make >> that decision for me. >> >> Avati >> >> >
