Re: Board Report

Pat Ferrel Mon, 07 Apr 2014 10:30:11 -0700

The document does not mention the state of the existing Spark work in the 
snapshot codebase. Shouldn’t this be noted?


On Apr 7, 2014, at 5:06 AM, Sebastian Schelter <[email protected]> wrote:

I think we should mention the redesign/rework of the website and the completion 
of the move from the old wiki to Apache CMS.

--sebastian

On 04/07/2014 02:04 PM, Grant Ingersoll wrote:
> Here is my proposed report.  For the most part, I think the only right thing 
> to do vis-a-vis the Board is to report that we are in the midst of a healthy 
> (yes, I believe it is, for the most part healthy and normal) discussion on 
> where to go next.
> 
> PMC Members: this is checked into SVN at 
> https://svn.apache.org/repos/asf/mahout/pmc/board-reports/2014/board-report-apr.txt.
>   It is due on Wednesday.  If you object to this approach of reporting, 
> please let me know ASAP and suggest alternatives.
> 
> === Apache Mahout Status Report: April 2014 ===
> 
> -----
> 
> Apache Mahout has implementations of a wide range of machine learning and
> data mining algorithms: clustering, classification, collaborative filtering
> and frequent pattern mining
> 
> Project Status
> --------------
> 
> The project continues to have a large and active user base.  While
> the developer base has continued to grow, there is a very active
> and healthy debate going on about where Mahout goes next.  Please
> see the Issues section below for more details.
> 
> Community
> ---------
> 
> * Andrew Musselman was voted in as new committer.
> * No changes to the PMC in the reporting period.
> 
> * The main issue concerning the community right now is the addition
> of new contributions from 0xData and the integration of Mahout with Spark.
> 
> Community Objectives
> --------------------
> 
> Our goal is to build scalable machine learning libraries. See the Issues
> section below for the debate in the community about our objectives.
> 
> 
> Releases
> --------
> 
> In addition to an ongoing debate on Mahout's future, the community is actively
>  working on integrating Mahout with Scala/Spark, updating
> documentation, and bringing in new code and committers to update the core 
> project.
> 
> 
> Issues
> ------
> The Mahout community is at a crossroads in terms of where
> to go next.  While the project has a broad number of users and interested
> parties, most committers are trying to maintain the code base on a purely
> part time basis, when the amount of work to sustain these users
> clearly points to it needing to
> be full time.  Furthermore, much of our original code base is written
> for Hadoop MapReduce 1.0, which many in the community have come to realize
> is not well-suited for solving the kinds of problems that Mahout has set
> out to solve.  There have been several lengthy discussions and prototypes
> going on to work out next directions along the lines of the Spark and
> 0xData contributions (there are numerous threads on the [email protected]
> mailing list.)
> 
> The PMC does not think this requires Board intervention at this time
> as the debate is, as far as we can tell, healthy.  We do, however,
> expect that this debate will take some time to resolve and may mean we
> won't be shipping a 1.0 release any time soon.  We will keep the Board
> apprised of our next steps as we work through the process.
> 
> 
> 
> 
> On Apr 7, 2014, at 4:53 AM, Grant Ingersoll <[email protected]> wrote:
> 
>> To Sean's point, if Mahout were "my company", I would do the following, 
>> albeit pragmatic and not so pleasant thing, assuming, of course, I had the 
>> $$$ to do so:
>> 
>> 1. Clean up existing code with a laser focus on a few key areas (Sebastian's 
>> list makes sense) using a part of the team and call it 1.0 and ship it, as 
>> it has a number of users and they deserve to not have the rug pulled out 
>> from under them.
>> 
>> 2. Spin out a subset of the team to explore and prototype 2.0 based on two 
>> very positive and re-energizing looking ideas:
>>      a. Scala DSL (and maybe Spark)
>>      b. 0xData
>>      
>>      All of the work for #2 would be done in a clean repo and would only 
>> bring in legacy code where it was truly beneficial (back compat. can come 
>> later, if at all).
>>      It would then benchmark those two approaches as well as look at where 
>> they overlap and are mutually beneficial and then go forward with the winner.
>> 
>> 3. Once #2 is viable, put most effort into it and maintain 1.0 with as 
>> minimal support as possible, encouraging, neh -- actively helping -- 1.0 
>> customers upgrade as quickly as possible.
>> 
>> The tricky part then becomes how do you make sure to still make your sales 
>> #'s while also convincing them that your roadmap is what they are really 
>> buying.
>> 
>> If I didn't have the $$$ to do both of these (i.e. we need a massive turn 
>> around and we have one last shot), I would be all in on #2.
>> 
>> -----------------------------------
>> 
>> That being said, Mahout is not "my company".  Heck, Mahout is not even a 
>> "company", so we don't need to be bound by company conventions and thought 
>> processes, even if that fits with all of our individual day jobs.  And, 
>> thankfully, we don't have any sales numbers to make.
>> 
>> We are chartered with one and only one mission: produce open source, 
>> scalable machine learning libraries under the Apache license and community 
>> driven principles.  We are not required by the Board or anyone else to 
>> support version X for Y years or to use Hadoop or Scala or Java.  We are 
>> also not required to implement any specific algorithms or deliver them on 
>> specific time frames.  We are also not required to provide users upgrade 
>> paths or the like.  Naturally, we _want_ to do these things for the sake of 
>> the community, but let's be clear: it is not a requirement from the ASF.  We 
>> are, however, required, to have a sustaining community.
>> 
>> ------------------------------------
>> 
>> I personally think we should start clean on #2, throwing off the shackles of 
>> the past and emerge 6-9 months later with Mahout 2.0 (and yes, call it that, 
>> not 0.1 as Sebastian suggests, for marketing reasons) built on a completely 
>> new and fresh repository, likely bringing in only the Math/collections 
>> underpinnings and maybe the build system.  This new repository would have 
>> only a handful of core algorithms that we know are well implemented, 
>> sustainable and best in class.
>> 
>> I think we should look at the lead up to 0.9 as an experiment that proved 
>> out a lot of interesting ideas, including the fact that Mahout proved there 
>> is vast interest in open source large scale machine learning and that it is 
>> the benchmark for comparison.  Not many other ML projects can say that, even 
>> if they have better technical implementations or are less fragmented.  Once 
>> you realize something has outlived it's usefulness in software, however, 
>> there is no point in lingering.
>> 
>> That being said, at least for the foreseeable future, I am not in a position 
>> to contribute much code.  So, from my perspective, the ASF Meritocratic 
>> approach takes over:  those who do the work make the decisions.  If you want 
>> something in, then put up the patch and ask for feedback.  If no one 
>> provides feedback, assume lazy consensus and move forward.  Nothing 
>> convinces people better than actual, real, executing code.  For my part, I 
>> am happy to continue to work the bureaucratic side of things to make sure 
>> reports get filed, credentials get created, etc. and the occasional patch.  
>> I hope one day I will have time to contribute again.
>> 
>> I will follow up w/ a separate email on what I am going to put in the Board 
>> Report.
>>      
>> On Apr 7, 2014, at 1:52 AM, Sean Owen <[email protected]> wrote:
>> 
>>> No, it's about the opposite. I'm referring to the default, current
>>> state of play here.
>>> 
>>> The issues for a vendor are demand and supportability. Do people want
>>> to pay for support of X? Can you honestly say you have expertise to
>>> support and influence X over at least a major release cycle (12-18
>>> months)? The latter needs a reasonably reliable roadmap and
>>> continuity.
>>> 
>>> I'm suggesting that in the current state, demand is low and going
>>> down. The current code base seems de facto deprecated/unsupported
>>> already, and possibly to be removed or dramatically changed into
>>> something as-yet unclear. Nobody here seems to have taken a hard
>>> decision regarding a next major release, but, the trajectory of that
>>> decision seems clear if the current state remains the same.
>>> 
>>> From my perspective, "middle-ground" new directions like adding a bit
>>> of H2O, a bit of Spark, leaving bits of M/R code around, etc. are only
>>> worse. I can see why there may be a little renewed demand for the new
>>> bits, but then, why not go all in on one of them?
>>> 
>>> Because a substantially all-new direction is a different story. If a
>>> "Mahout2O" or "Spahout" ("Mark"?) emerges as a plan, I could imagine a
>>> lot of renewed demand. And a clearer underlying roadmap sounds
>>> possible. It would remain to be seen, but there's nothing stopping
>>> those ideas from becoming part of a distro too.
>>> 
>>> 
>>> On Mon, Apr 7, 2014 at 6:22 AM, Ted Dunning <[email protected]> wrote:
>>>> Please be explicit here.  It sounds like you are saying that if Mahout goes
>>>> in the proposed new direction that Cloudera will drop Mahout.
>>>> 
>>>> Is that what you mean to say?
>> 
>> 
> 
> --------------------------------------------
> Grant Ingersoll | @gsingers
> http://www.lucidworks.com
> 
> 
> 
> 
>

Re: Board Report

Reply via email to