Re: Tackling the "legacy dilemma"

Frank Scholten Thu, 17 Apr 2014 11:51:57 -0700

Thanks for this Sebastian. I think the new direction is exciting but indeed
we first should focus on what we all agree on.


+1 on renaming of core to mr-legacy and moving stuff out and deprecating
some of the algorithms.

I would like to help with this restructuring.

Cheers,

Frank









On Tue, Apr 15, 2014 at 6:57 AM, Sebastian Schelter <[email protected]> wrote:

> Hi,
>
> From reading the thread, I have the impression that we agree on the
> following actions:
>
>
>  * reject any future MR algorithm contributions, prominently state this
> on the website and in talks
>  * make all existing algorithm code compatible with Hadoop 2, if there is
> no one willing to make an existing algorithm compatible, remove the
> algorithm
>  * deprecate Canopy clustering
>  * email the original FPM and random forest authors to ask for maintenance
> of the algorithms
>  * rename core to "mr-legacy" (and  gradually pull items we really need
> out of that later)
>
> I will create jira tickets for those action points. I think the biggest
> challenge here is the Hadoop 2 compatibility, is someone volunteering to
> drive that? Would be awesome.
>
> Best,
> Sebastian
>
>
>
> On 04/13/2014 07:19 PM, Andrew Musselman wrote:
>
>> This is a good summary of how I feel too.
>>
>>  On Apr 13, 2014, at 10:15 AM, Sebastian Schelter <[email protected]> wrote:
>>>
>>> Unfortunately, its not that easy to get enough voluntary work. I issued
>>> the third call for working on the documentation today as there are still
>>> lots of open issues. That's why I'm trying to suggest a move that involves
>>> as few work as possible.
>>>
>>> We should get the MR codebase into a state that we all can live with and
>>> then focus on new stuff like the scala DSL.
>>>
>>> --sebastian
>>>
>>>
>>>
>>>
>>>  On 04/13/2014 07:09 PM, Giorgio Zoppi wrote:
>>>> The best thing, should be do a plan, and see how much effort do you
>>>> need to
>>>> this. Then find out voluntaries to accomplish the task. Quite sure that
>>>> there a lot of people around there that they are willing to help out.
>>>>
>>>> BR,
>>>> deneb.
>>>>
>>>>
>>>> 2014-04-13 18:45 GMT+02:00 Sebastian Schelter <[email protected]>:
>>>>
>>>>  Hi,
>>>>>
>>>>> I took some days to let the latest discussion about the state and
>>>>> future
>>>>> of Mahout go through my head. I think the most important thing to
>>>>> address
>>>>> right now is the MapReduce "legacy" codebase. A lot of the MR
>>>>> algorithms
>>>>> are currently unmaintained, documentation is outdated and the original
>>>>> authors have abandoned Mahout. For some algorithms it is hard to get
>>>>> even
>>>>> questions answered on the mailinglist (e.g. RandomForest). I agree with
>>>>> Sean's comments that letting the code linger around is no option and
>>>>> will
>>>>> continue to harm Mahout.
>>>>>
>>>>> In the previous discussion, I suggested to make a radical move and aim
>>>>> to
>>>>> delete this codebase, but there were serious objections from
>>>>> committers and
>>>>> users that convinced me that there is still usage of and interested in
>>>>> that
>>>>> codebase.
>>>>>
>>>>> That puts us into a "legacy dilemma". We cannot delete the code without
>>>>> harming our userbase. On the other hand, I don't see anyone willing to
>>>>> rework the codebase. Further, the code cannot linger around anymore as
>>>>> it
>>>>> is doing now, especially when we fail to answer questions or don't
>>>>> provide
>>>>> documentation.
>>>>>
>>>>> *We have to make a move*!
>>>>>
>>>>> I suggest the following actions with regard to the MR codebase. I hope
>>>>> that they find consent. If there are objections, please give
>>>>> alternatives,
>>>>> *keeping everything as-is is not an option*:
>>>>>
>>>>>   * reject any future MR algorithm contributions, prominently state
>>>>> this on
>>>>> the website and in talks
>>>>>   * make all existing algorithm code compatible with Hadoop 2, if
>>>>> there is
>>>>> no one willing to make an existing algorithm compatible, remove the
>>>>> algorithm
>>>>>   * deprecate the existing MR algorithms, yet still take bug fix
>>>>> contributions
>>>>>   * remove Random Forest as we cannot even answer questions to the
>>>>> implementation on the mailinglist
>>>>>
>>>>> There are two more actions that I would like to see, but'd be willing
>>>>> to
>>>>> give up if there are objections:
>>>>>
>>>>>   * move the MR algorithms into a separate maven module
>>>>>   * remove Frequent Pattern Mining again (we already aimed for that in
>>>>> 0.9
>>>>> but had one user who shouted but never returned to us)
>>>>>
>>>>> Let me know what you think.
>>>>>
>>>>> --sebastian
>>>>>
>>>>
>>>
>

Re: Tackling the "legacy dilemma"

Reply via email to