Good point, please update the report (you should have credentials) -Grant
On Apr 7, 2014, at 5:06 AM, Sebastian Schelter <s...@apache.org> wrote: > I think we should mention the redesign/rework of the website and the > completion of the move from the old wiki to Apache CMS. > > --sebastian > > On 04/07/2014 02:04 PM, Grant Ingersoll wrote: >> Here is my proposed report. For the most part, I think the only right thing >> to do vis-a-vis the Board is to report that we are in the midst of a healthy >> (yes, I believe it is, for the most part healthy and normal) discussion on >> where to go next. >> >> PMC Members: this is checked into SVN at >> https://svn.apache.org/repos/asf/mahout/pmc/board-reports/2014/board-report-apr.txt. >> It is due on Wednesday. If you object to this approach of reporting, >> please let me know ASAP and suggest alternatives. >> >> === Apache Mahout Status Report: April 2014 === >> >> ----- >> >> Apache Mahout has implementations of a wide range of machine learning and >> data mining algorithms: clustering, classification, collaborative filtering >> and frequent pattern mining >> >> Project Status >> -------------- >> >> The project continues to have a large and active user base. While >> the developer base has continued to grow, there is a very active >> and healthy debate going on about where Mahout goes next. Please >> see the Issues section below for more details. >> >> Community >> --------- >> >> * Andrew Musselman was voted in as new committer. >> * No changes to the PMC in the reporting period. >> >> * The main issue concerning the community right now is the addition >> of new contributions from 0xData and the integration of Mahout with Spark. >> >> Community Objectives >> -------------------- >> >> Our goal is to build scalable machine learning libraries. See the Issues >> section below for the debate in the community about our objectives. >> >> >> Releases >> -------- >> >> In addition to an ongoing debate on Mahout's future, the community is >> actively >> working on integrating Mahout with Scala/Spark, updating >> documentation, and bringing in new code and committers to update the core >> project. >> >> >> Issues >> ------ >> The Mahout community is at a crossroads in terms of where >> to go next. While the project has a broad number of users and interested >> parties, most committers are trying to maintain the code base on a purely >> part time basis, when the amount of work to sustain these users >> clearly points to it needing to >> be full time. Furthermore, much of our original code base is written >> for Hadoop MapReduce 1.0, which many in the community have come to realize >> is not well-suited for solving the kinds of problems that Mahout has set >> out to solve. There have been several lengthy discussions and prototypes >> going on to work out next directions along the lines of the Spark and >> 0xData contributions (there are numerous threads on the dev@mahout.a.o >> mailing list.) >> >> The PMC does not think this requires Board intervention at this time >> as the debate is, as far as we can tell, healthy. We do, however, >> expect that this debate will take some time to resolve and may mean we >> won't be shipping a 1.0 release any time soon. We will keep the Board >> apprised of our next steps as we work through the process. >> >> >> >> >> On Apr 7, 2014, at 4:53 AM, Grant Ingersoll <gsing...@apache.org> wrote: >> >>> To Sean's point, if Mahout were "my company", I would do the following, >>> albeit pragmatic and not so pleasant thing, assuming, of course, I had the >>> $$$ to do so: >>> >>> 1. Clean up existing code with a laser focus on a few key areas >>> (Sebastian's list makes sense) using a part of the team and call it 1.0 and >>> ship it, as it has a number of users and they deserve to not have the rug >>> pulled out from under them. >>> >>> 2. Spin out a subset of the team to explore and prototype 2.0 based on two >>> very positive and re-energizing looking ideas: >>> a. Scala DSL (and maybe Spark) >>> b. 0xData >>> >>> All of the work for #2 would be done in a clean repo and would only >>> bring in legacy code where it was truly beneficial (back compat. can come >>> later, if at all). >>> It would then benchmark those two approaches as well as look at where >>> they overlap and are mutually beneficial and then go forward with the >>> winner. >>> >>> 3. Once #2 is viable, put most effort into it and maintain 1.0 with as >>> minimal support as possible, encouraging, neh -- actively helping -- 1.0 >>> customers upgrade as quickly as possible. >>> >>> The tricky part then becomes how do you make sure to still make your sales >>> #'s while also convincing them that your roadmap is what they are really >>> buying. >>> >>> If I didn't have the $$$ to do both of these (i.e. we need a massive turn >>> around and we have one last shot), I would be all in on #2. >>> >>> ----------------------------------- >>> >>> That being said, Mahout is not "my company". Heck, Mahout is not even a >>> "company", so we don't need to be bound by company conventions and thought >>> processes, even if that fits with all of our individual day jobs. And, >>> thankfully, we don't have any sales numbers to make. >>> >>> We are chartered with one and only one mission: produce open source, >>> scalable machine learning libraries under the Apache license and community >>> driven principles. We are not required by the Board or anyone else to >>> support version X for Y years or to use Hadoop or Scala or Java. We are >>> also not required to implement any specific algorithms or deliver them on >>> specific time frames. We are also not required to provide users upgrade >>> paths or the like. Naturally, we _want_ to do these things for the sake of >>> the community, but let's be clear: it is not a requirement from the ASF. >>> We are, however, required, to have a sustaining community. >>> >>> ------------------------------------ >>> >>> I personally think we should start clean on #2, throwing off the shackles >>> of the past and emerge 6-9 months later with Mahout 2.0 (and yes, call it >>> that, not 0.1 as Sebastian suggests, for marketing reasons) built on a >>> completely new and fresh repository, likely bringing in only the >>> Math/collections underpinnings and maybe the build system. This new >>> repository would have only a handful of core algorithms that we know are >>> well implemented, sustainable and best in class. >>> >>> I think we should look at the lead up to 0.9 as an experiment that proved >>> out a lot of interesting ideas, including the fact that Mahout proved there >>> is vast interest in open source large scale machine learning and that it is >>> the benchmark for comparison. Not many other ML projects can say that, >>> even if they have better technical implementations or are less fragmented. >>> Once you realize something has outlived it's usefulness in software, >>> however, there is no point in lingering. >>> >>> That being said, at least for the foreseeable future, I am not in a >>> position to contribute much code. So, from my perspective, the ASF >>> Meritocratic approach takes over: those who do the work make the >>> decisions. If you want something in, then put up the patch and ask for >>> feedback. If no one provides feedback, assume lazy consensus and move >>> forward. Nothing convinces people better than actual, real, executing >>> code. For my part, I am happy to continue to work the bureaucratic side of >>> things to make sure reports get filed, credentials get created, etc. and >>> the occasional patch. I hope one day I will have time to contribute again. >>> >>> I will follow up w/ a separate email on what I am going to put in the Board >>> Report. >>> >>> On Apr 7, 2014, at 1:52 AM, Sean Owen <sro...@gmail.com> wrote: >>> >>>> No, it's about the opposite. I'm referring to the default, current >>>> state of play here. >>>> >>>> The issues for a vendor are demand and supportability. Do people want >>>> to pay for support of X? Can you honestly say you have expertise to >>>> support and influence X over at least a major release cycle (12-18 >>>> months)? The latter needs a reasonably reliable roadmap and >>>> continuity. >>>> >>>> I'm suggesting that in the current state, demand is low and going >>>> down. The current code base seems de facto deprecated/unsupported >>>> already, and possibly to be removed or dramatically changed into >>>> something as-yet unclear. Nobody here seems to have taken a hard >>>> decision regarding a next major release, but, the trajectory of that >>>> decision seems clear if the current state remains the same. >>>> >>>> From my perspective, "middle-ground" new directions like adding a bit >>>> of H2O, a bit of Spark, leaving bits of M/R code around, etc. are only >>>> worse. I can see why there may be a little renewed demand for the new >>>> bits, but then, why not go all in on one of them? >>>> >>>> Because a substantially all-new direction is a different story. If a >>>> "Mahout2O" or "Spahout" ("Mark"?) emerges as a plan, I could imagine a >>>> lot of renewed demand. And a clearer underlying roadmap sounds >>>> possible. It would remain to be seen, but there's nothing stopping >>>> those ideas from becoming part of a distro too. >>>> >>>> >>>> On Mon, Apr 7, 2014 at 6:22 AM, Ted Dunning <ted.dunn...@gmail.com> wrote: >>>>> Please be explicit here. It sounds like you are saying that if Mahout >>>>> goes >>>>> in the proposed new direction that Cloudera will drop Mahout. >>>>> >>>>> Is that what you mean to say? >>> >>> >> >> -------------------------------------------- >> Grant Ingersoll | @gsingers >> http://www.lucidworks.com >> >> >> >> >> > -------------------------------------------- Grant Ingersoll | @gsingers http://www.lucidworks.com