The document does not mention the state of the existing Spark work in the snapshot codebase. Shouldn’t this be noted?
On Apr 7, 2014, at 5:06 AM, Sebastian Schelter <s...@apache.org> wrote: I think we should mention the redesign/rework of the website and the completion of the move from the old wiki to Apache CMS. --sebastian On 04/07/2014 02:04 PM, Grant Ingersoll wrote: > Here is my proposed report. For the most part, I think the only right thing > to do vis-a-vis the Board is to report that we are in the midst of a healthy > (yes, I believe it is, for the most part healthy and normal) discussion on > where to go next. > > PMC Members: this is checked into SVN at > https://svn.apache.org/repos/asf/mahout/pmc/board-reports/2014/board-report-apr.txt. > It is due on Wednesday. If you object to this approach of reporting, > please let me know ASAP and suggest alternatives. > > === Apache Mahout Status Report: April 2014 === > > ----- > > Apache Mahout has implementations of a wide range of machine learning and > data mining algorithms: clustering, classification, collaborative filtering > and frequent pattern mining > > Project Status > -------------- > > The project continues to have a large and active user base. While > the developer base has continued to grow, there is a very active > and healthy debate going on about where Mahout goes next. Please > see the Issues section below for more details. > > Community > --------- > > * Andrew Musselman was voted in as new committer. > * No changes to the PMC in the reporting period. > > * The main issue concerning the community right now is the addition > of new contributions from 0xData and the integration of Mahout with Spark. > > Community Objectives > -------------------- > > Our goal is to build scalable machine learning libraries. See the Issues > section below for the debate in the community about our objectives. > > > Releases > -------- > > In addition to an ongoing debate on Mahout's future, the community is actively > working on integrating Mahout with Scala/Spark, updating > documentation, and bringing in new code and committers to update the core > project. > > > Issues > ------ > The Mahout community is at a crossroads in terms of where > to go next. While the project has a broad number of users and interested > parties, most committers are trying to maintain the code base on a purely > part time basis, when the amount of work to sustain these users > clearly points to it needing to > be full time. Furthermore, much of our original code base is written > for Hadoop MapReduce 1.0, which many in the community have come to realize > is not well-suited for solving the kinds of problems that Mahout has set > out to solve. There have been several lengthy discussions and prototypes > going on to work out next directions along the lines of the Spark and > 0xData contributions (there are numerous threads on the dev@mahout.a.o > mailing list.) > > The PMC does not think this requires Board intervention at this time > as the debate is, as far as we can tell, healthy. We do, however, > expect that this debate will take some time to resolve and may mean we > won't be shipping a 1.0 release any time soon. We will keep the Board > apprised of our next steps as we work through the process. > > > > > On Apr 7, 2014, at 4:53 AM, Grant Ingersoll <gsing...@apache.org> wrote: > >> To Sean's point, if Mahout were "my company", I would do the following, >> albeit pragmatic and not so pleasant thing, assuming, of course, I had the >> $$$ to do so: >> >> 1. Clean up existing code with a laser focus on a few key areas (Sebastian's >> list makes sense) using a part of the team and call it 1.0 and ship it, as >> it has a number of users and they deserve to not have the rug pulled out >> from under them. >> >> 2. Spin out a subset of the team to explore and prototype 2.0 based on two >> very positive and re-energizing looking ideas: >> a. Scala DSL (and maybe Spark) >> b. 0xData >> >> All of the work for #2 would be done in a clean repo and would only >> bring in legacy code where it was truly beneficial (back compat. can come >> later, if at all). >> It would then benchmark those two approaches as well as look at where >> they overlap and are mutually beneficial and then go forward with the winner. >> >> 3. Once #2 is viable, put most effort into it and maintain 1.0 with as >> minimal support as possible, encouraging, neh -- actively helping -- 1.0 >> customers upgrade as quickly as possible. >> >> The tricky part then becomes how do you make sure to still make your sales >> #'s while also convincing them that your roadmap is what they are really >> buying. >> >> If I didn't have the $$$ to do both of these (i.e. we need a massive turn >> around and we have one last shot), I would be all in on #2. >> >> ----------------------------------- >> >> That being said, Mahout is not "my company". Heck, Mahout is not even a >> "company", so we don't need to be bound by company conventions and thought >> processes, even if that fits with all of our individual day jobs. And, >> thankfully, we don't have any sales numbers to make. >> >> We are chartered with one and only one mission: produce open source, >> scalable machine learning libraries under the Apache license and community >> driven principles. We are not required by the Board or anyone else to >> support version X for Y years or to use Hadoop or Scala or Java. We are >> also not required to implement any specific algorithms or deliver them on >> specific time frames. We are also not required to provide users upgrade >> paths or the like. Naturally, we _want_ to do these things for the sake of >> the community, but let's be clear: it is not a requirement from the ASF. We >> are, however, required, to have a sustaining community. >> >> ------------------------------------ >> >> I personally think we should start clean on #2, throwing off the shackles of >> the past and emerge 6-9 months later with Mahout 2.0 (and yes, call it that, >> not 0.1 as Sebastian suggests, for marketing reasons) built on a completely >> new and fresh repository, likely bringing in only the Math/collections >> underpinnings and maybe the build system. This new repository would have >> only a handful of core algorithms that we know are well implemented, >> sustainable and best in class. >> >> I think we should look at the lead up to 0.9 as an experiment that proved >> out a lot of interesting ideas, including the fact that Mahout proved there >> is vast interest in open source large scale machine learning and that it is >> the benchmark for comparison. Not many other ML projects can say that, even >> if they have better technical implementations or are less fragmented. Once >> you realize something has outlived it's usefulness in software, however, >> there is no point in lingering. >> >> That being said, at least for the foreseeable future, I am not in a position >> to contribute much code. So, from my perspective, the ASF Meritocratic >> approach takes over: those who do the work make the decisions. If you want >> something in, then put up the patch and ask for feedback. If no one >> provides feedback, assume lazy consensus and move forward. Nothing >> convinces people better than actual, real, executing code. For my part, I >> am happy to continue to work the bureaucratic side of things to make sure >> reports get filed, credentials get created, etc. and the occasional patch. >> I hope one day I will have time to contribute again. >> >> I will follow up w/ a separate email on what I am going to put in the Board >> Report. >> >> On Apr 7, 2014, at 1:52 AM, Sean Owen <sro...@gmail.com> wrote: >> >>> No, it's about the opposite. I'm referring to the default, current >>> state of play here. >>> >>> The issues for a vendor are demand and supportability. Do people want >>> to pay for support of X? Can you honestly say you have expertise to >>> support and influence X over at least a major release cycle (12-18 >>> months)? The latter needs a reasonably reliable roadmap and >>> continuity. >>> >>> I'm suggesting that in the current state, demand is low and going >>> down. The current code base seems de facto deprecated/unsupported >>> already, and possibly to be removed or dramatically changed into >>> something as-yet unclear. Nobody here seems to have taken a hard >>> decision regarding a next major release, but, the trajectory of that >>> decision seems clear if the current state remains the same. >>> >>> From my perspective, "middle-ground" new directions like adding a bit >>> of H2O, a bit of Spark, leaving bits of M/R code around, etc. are only >>> worse. I can see why there may be a little renewed demand for the new >>> bits, but then, why not go all in on one of them? >>> >>> Because a substantially all-new direction is a different story. If a >>> "Mahout2O" or "Spahout" ("Mark"?) emerges as a plan, I could imagine a >>> lot of renewed demand. And a clearer underlying roadmap sounds >>> possible. It would remain to be seen, but there's nothing stopping >>> those ideas from becoming part of a distro too. >>> >>> >>> On Mon, Apr 7, 2014 at 6:22 AM, Ted Dunning <ted.dunn...@gmail.com> wrote: >>>> Please be explicit here. It sounds like you are saying that if Mahout goes >>>> in the proposed new direction that Cloudera will drop Mahout. >>>> >>>> Is that what you mean to say? >> >> > > -------------------------------------------- > Grant Ingersoll | @gsingers > http://www.lucidworks.com > > > > >