Re: Tackling the "legacy dilemma"

Andrew Musselman Sun, 13 Apr 2014 10:20:06 -0700

This is a good summary of how I feel too.


> On Apr 13, 2014, at 10:15 AM, Sebastian Schelter <[email protected]> wrote:
> 
> Unfortunately, its not that easy to get enough voluntary work. I issued the 
> third call for working on the documentation today as there are still lots of 
> open issues. That's why I'm trying to suggest a move that involves as few 
> work as possible.
> 
> We should get the MR codebase into a state that we all can live with and then 
> focus on new stuff like the scala DSL.
> 
> --sebastian
> 
> 
> 
> 
>> On 04/13/2014 07:09 PM, Giorgio Zoppi wrote:
>> The best thing, should be do a plan, and see how much effort do you need to
>> this. Then find out voluntaries to accomplish the task. Quite sure that
>> there a lot of people around there that they are willing to help out.
>> 
>> BR,
>> deneb.
>> 
>> 
>> 2014-04-13 18:45 GMT+02:00 Sebastian Schelter <[email protected]>:
>> 
>>> Hi,
>>> 
>>> I took some days to let the latest discussion about the state and future
>>> of Mahout go through my head. I think the most important thing to address
>>> right now is the MapReduce "legacy" codebase. A lot of the MR algorithms
>>> are currently unmaintained, documentation is outdated and the original
>>> authors have abandoned Mahout. For some algorithms it is hard to get even
>>> questions answered on the mailinglist (e.g. RandomForest). I agree with
>>> Sean's comments that letting the code linger around is no option and will
>>> continue to harm Mahout.
>>> 
>>> In the previous discussion, I suggested to make a radical move and aim to
>>> delete this codebase, but there were serious objections from committers and
>>> users that convinced me that there is still usage of and interested in that
>>> codebase.
>>> 
>>> That puts us into a "legacy dilemma". We cannot delete the code without
>>> harming our userbase. On the other hand, I don't see anyone willing to
>>> rework the codebase. Further, the code cannot linger around anymore as it
>>> is doing now, especially when we fail to answer questions or don't provide
>>> documentation.
>>> 
>>> *We have to make a move*!
>>> 
>>> I suggest the following actions with regard to the MR codebase. I hope
>>> that they find consent. If there are objections, please give alternatives,
>>> *keeping everything as-is is not an option*:
>>> 
>>>  * reject any future MR algorithm contributions, prominently state this on
>>> the website and in talks
>>>  * make all existing algorithm code compatible with Hadoop 2, if there is
>>> no one willing to make an existing algorithm compatible, remove the
>>> algorithm
>>>  * deprecate the existing MR algorithms, yet still take bug fix
>>> contributions
>>>  * remove Random Forest as we cannot even answer questions to the
>>> implementation on the mailinglist
>>> 
>>> There are two more actions that I would like to see, but'd be willing to
>>> give up if there are objections:
>>> 
>>>  * move the MR algorithms into a separate maven module
>>>  * remove Frequent Pattern Mining again (we already aimed for that in 0.9
>>> but had one user who shouted but never returned to us)
>>> 
>>> Let me know what you think.
>>> 
>>> --sebastian
>

Re: Tackling the "legacy dilemma"

Reply via email to