Re: Board Report

Grant Ingersoll Mon, 07 Apr 2014 05:40:31 -0700

Good point, please update the report (you should have credentials)

-Grant


On Apr 7, 2014, at 5:06 AM, Sebastian Schelter <[email protected]> wrote:

> I think we should mention the redesign/rework of the website and the 
> completion of the move from the old wiki to Apache CMS.
> 
> --sebastian
> 
> On 04/07/2014 02:04 PM, Grant Ingersoll wrote:
>> Here is my proposed report.  For the most part, I think the only right thing 
>> to do vis-a-vis the Board is to report that we are in the midst of a healthy 
>> (yes, I believe it is, for the most part healthy and normal) discussion on 
>> where to go next.
>> 
>> PMC Members: this is checked into SVN at 
>> https://svn.apache.org/repos/asf/mahout/pmc/board-reports/2014/board-report-apr.txt.
>>   It is due on Wednesday.  If you object to this approach of reporting, 
>> please let me know ASAP and suggest alternatives.
>> 
>> === Apache Mahout Status Report: April 2014 ===
>> 
>> -----
>> 
>> Apache Mahout has implementations of a wide range of machine learning and
>> data mining algorithms: clustering, classification, collaborative filtering
>> and frequent pattern mining
>> 
>> Project Status
>> --------------
>> 
>> The project continues to have a large and active user base.  While
>> the developer base has continued to grow, there is a very active
>> and healthy debate going on about where Mahout goes next.  Please
>> see the Issues section below for more details.
>> 
>> Community
>> ---------
>> 
>> * Andrew Musselman was voted in as new committer.
>> * No changes to the PMC in the reporting period.
>> 
>> * The main issue concerning the community right now is the addition
>> of new contributions from 0xData and the integration of Mahout with Spark.
>> 
>> Community Objectives
>> --------------------
>> 
>> Our goal is to build scalable machine learning libraries. See the Issues
>> section below for the debate in the community about our objectives.
>> 
>> 
>> Releases
>> --------
>> 
>> In addition to an ongoing debate on Mahout's future, the community is 
>> actively
>>  working on integrating Mahout with Scala/Spark, updating
>> documentation, and bringing in new code and committers to update the core 
>> project.
>> 
>> 
>> Issues
>> ------
>> The Mahout community is at a crossroads in terms of where
>> to go next.  While the project has a broad number of users and interested
>> parties, most committers are trying to maintain the code base on a purely
>> part time basis, when the amount of work to sustain these users
>> clearly points to it needing to
>> be full time.  Furthermore, much of our original code base is written
>> for Hadoop MapReduce 1.0, which many in the community have come to realize
>> is not well-suited for solving the kinds of problems that Mahout has set
>> out to solve.  There have been several lengthy discussions and prototypes
>> going on to work out next directions along the lines of the Spark and
>> 0xData contributions (there are numerous threads on the [email protected]
>> mailing list.)
>> 
>> The PMC does not think this requires Board intervention at this time
>> as the debate is, as far as we can tell, healthy.  We do, however,
>> expect that this debate will take some time to resolve and may mean we
>> won't be shipping a 1.0 release any time soon.  We will keep the Board
>> apprised of our next steps as we work through the process.
>> 
>> 
>> 
>> 
>> On Apr 7, 2014, at 4:53 AM, Grant Ingersoll <[email protected]> wrote:
>> 
>>> To Sean's point, if Mahout were "my company", I would do the following, 
>>> albeit pragmatic and not so pleasant thing, assuming, of course, I had the 
>>> $$$ to do so:
>>> 
>>> 1. Clean up existing code with a laser focus on a few key areas 
>>> (Sebastian's list makes sense) using a part of the team and call it 1.0 and 
>>> ship it, as it has a number of users and they deserve to not have the rug 
>>> pulled out from under them.
>>> 
>>> 2. Spin out a subset of the team to explore and prototype 2.0 based on two 
>>> very positive and re-energizing looking ideas:
>>>     a. Scala DSL (and maybe Spark)
>>>     b. 0xData
>>>     
>>>     All of the work for #2 would be done in a clean repo and would only 
>>> bring in legacy code where it was truly beneficial (back compat. can come 
>>> later, if at all).
>>>     It would then benchmark those two approaches as well as look at where 
>>> they overlap and are mutually beneficial and then go forward with the 
>>> winner.
>>> 
>>> 3. Once #2 is viable, put most effort into it and maintain 1.0 with as 
>>> minimal support as possible, encouraging, neh -- actively helping -- 1.0 
>>> customers upgrade as quickly as possible.
>>> 
>>> The tricky part then becomes how do you make sure to still make your sales 
>>> #'s while also convincing them that your roadmap is what they are really 
>>> buying.
>>> 
>>> If I didn't have the $$$ to do both of these (i.e. we need a massive turn 
>>> around and we have one last shot), I would be all in on #2.
>>> 
>>> -----------------------------------
>>> 
>>> That being said, Mahout is not "my company".  Heck, Mahout is not even a 
>>> "company", so we don't need to be bound by company conventions and thought 
>>> processes, even if that fits with all of our individual day jobs.  And, 
>>> thankfully, we don't have any sales numbers to make.
>>> 
>>> We are chartered with one and only one mission: produce open source, 
>>> scalable machine learning libraries under the Apache license and community 
>>> driven principles.  We are not required by the Board or anyone else to 
>>> support version X for Y years or to use Hadoop or Scala or Java.  We are 
>>> also not required to implement any specific algorithms or deliver them on 
>>> specific time frames.  We are also not required to provide users upgrade 
>>> paths or the like.  Naturally, we _want_ to do these things for the sake of 
>>> the community, but let's be clear: it is not a requirement from the ASF.  
>>> We are, however, required, to have a sustaining community.
>>> 
>>> ------------------------------------
>>> 
>>> I personally think we should start clean on #2, throwing off the shackles 
>>> of the past and emerge 6-9 months later with Mahout 2.0 (and yes, call it 
>>> that, not 0.1 as Sebastian suggests, for marketing reasons) built on a 
>>> completely new and fresh repository, likely bringing in only the 
>>> Math/collections underpinnings and maybe the build system.  This new 
>>> repository would have only a handful of core algorithms that we know are 
>>> well implemented, sustainable and best in class.
>>> 
>>> I think we should look at the lead up to 0.9 as an experiment that proved 
>>> out a lot of interesting ideas, including the fact that Mahout proved there 
>>> is vast interest in open source large scale machine learning and that it is 
>>> the benchmark for comparison.  Not many other ML projects can say that, 
>>> even if they have better technical implementations or are less fragmented.  
>>> Once you realize something has outlived it's usefulness in software, 
>>> however, there is no point in lingering.
>>> 
>>> That being said, at least for the foreseeable future, I am not in a 
>>> position to contribute much code.  So, from my perspective, the ASF 
>>> Meritocratic approach takes over:  those who do the work make the 
>>> decisions.  If you want something in, then put up the patch and ask for 
>>> feedback.  If no one provides feedback, assume lazy consensus and move 
>>> forward.  Nothing convinces people better than actual, real, executing 
>>> code.  For my part, I am happy to continue to work the bureaucratic side of 
>>> things to make sure reports get filed, credentials get created, etc. and 
>>> the occasional patch.  I hope one day I will have time to contribute again.
>>> 
>>> I will follow up w/ a separate email on what I am going to put in the Board 
>>> Report.
>>>     
>>> On Apr 7, 2014, at 1:52 AM, Sean Owen <[email protected]> wrote:
>>> 
>>>> No, it's about the opposite. I'm referring to the default, current
>>>> state of play here.
>>>> 
>>>> The issues for a vendor are demand and supportability. Do people want
>>>> to pay for support of X? Can you honestly say you have expertise to
>>>> support and influence X over at least a major release cycle (12-18
>>>> months)? The latter needs a reasonably reliable roadmap and
>>>> continuity.
>>>> 
>>>> I'm suggesting that in the current state, demand is low and going
>>>> down. The current code base seems de facto deprecated/unsupported
>>>> already, and possibly to be removed or dramatically changed into
>>>> something as-yet unclear. Nobody here seems to have taken a hard
>>>> decision regarding a next major release, but, the trajectory of that
>>>> decision seems clear if the current state remains the same.
>>>> 
>>>> From my perspective, "middle-ground" new directions like adding a bit
>>>> of H2O, a bit of Spark, leaving bits of M/R code around, etc. are only
>>>> worse. I can see why there may be a little renewed demand for the new
>>>> bits, but then, why not go all in on one of them?
>>>> 
>>>> Because a substantially all-new direction is a different story. If a
>>>> "Mahout2O" or "Spahout" ("Mark"?) emerges as a plan, I could imagine a
>>>> lot of renewed demand. And a clearer underlying roadmap sounds
>>>> possible. It would remain to be seen, but there's nothing stopping
>>>> those ideas from becoming part of a distro too.
>>>> 
>>>> 
>>>> On Mon, Apr 7, 2014 at 6:22 AM, Ted Dunning <[email protected]> wrote:
>>>>> Please be explicit here.  It sounds like you are saying that if Mahout 
>>>>> goes
>>>>> in the proposed new direction that Cloudera will drop Mahout.
>>>>> 
>>>>> Is that what you mean to say?
>>> 
>>> 
>> 
>> --------------------------------------------
>> Grant Ingersoll | @gsingers
>> http://www.lucidworks.com
>> 
>> 
>> 
>> 
>> 
> 

--------------------------------------------
Grant Ingersoll | @gsingers
http://www.lucidworks.com

Re: Board Report

Reply via email to