Re: Tackling the "legacy dilemma"

Manoj Awasthi Tue, 15 Apr 2014 19:52:30 -0700

>  * remove Random Forest as we cannot even answer questions to the
> implementation on the mailinglist
>
     -1 to removing present Random Forests. I think it is being used - we
(at adobe) are playing around with it a bit.  If the reason for removal is
that there no active maintainer that can be resolved by people using it
getting more active on this - a community action. FWIW, I vote against
throwing away this code.




On Tue, Apr 15, 2014 at 2:38 PM, Sebastian Schelter <[email protected]> wrote:

> On 04/15/2014 11:07 AM, Suneel Marthi wrote:
>
>> On Tue, Apr 15, 2014 at 12:57 AM, Sebastian Schelter <[email protected]>
>> wrote:
>>
>>  Hi,
>>>
>>>  From reading the thread, I have the impression that we agree on the
>>> following actions:
>>>
>>>
>>>   * reject any future MR algorithm contributions, prominently state this
>>> on the website and in talks
>>>   * make all existing algorithm code compatible with Hadoop 2, if there
>>> is
>>> no one willing to make an existing algorithm compatible, remove the
>>> algorithm
>>>   * deprecate Canopy clustering
>>>   * email the original FPM and random forest authors to ask for
>>> maintenance
>>> of the algorithms
>>>   * rename core to "mr-legacy" (and  gradually pull items we really need
>>> out of that later)
>>>
>>> I will create jira tickets for those action points. I think the biggest
>>> challenge here is the Hadoop 2 compatibility, is someone volunteering to
>>> drive that? Would be awesome.
>>>
>>>
>> With things settling down at work for me, I have time now to dedicate back
>> to Mahout. I can drive this effort.
>>
>
> That is great news!
>
>
>
>>
>>> Best,
>>> Sebastian
>>>
>>>
>>> On 04/13/2014 07:19 PM, Andrew Musselman wrote:
>>>
>>>  This is a good summary of how I feel too.
>>>>
>>>>   On Apr 13, 2014, at 10:15 AM, Sebastian Schelter <[email protected]>
>>>> wrote:
>>>>
>>>>>
>>>>> Unfortunately, its not that easy to get enough voluntary work. I issued
>>>>> the third call for working on the documentation today as there are
>>>>> still
>>>>> lots of open issues. That's why I'm trying to suggest a move that
>>>>> involves
>>>>> as few work as possible.
>>>>>
>>>>> We should get the MR codebase into a state that we all can live with
>>>>> and
>>>>> then focus on new stuff like the scala DSL.
>>>>>
>>>>> --sebastian
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>   On 04/13/2014 07:09 PM, Giorgio Zoppi wrote:
>>>>>
>>>>>> The best thing, should be do a plan, and see how much effort do you
>>>>>> need to
>>>>>> this. Then find out voluntaries to accomplish the task. Quite sure
>>>>>> that
>>>>>> there a lot of people around there that they are willing to help out.
>>>>>>
>>>>>> BR,
>>>>>> deneb.
>>>>>>
>>>>>>
>>>>>> 2014-04-13 18:45 GMT+02:00 Sebastian Schelter <[email protected]>:
>>>>>>
>>>>>>
>>>>>>   Hi,
>>>>>>
>>>>>>>
>>>>>>> I took some days to let the latest discussion about the state and
>>>>>>> future
>>>>>>> of Mahout go through my head. I think the most important thing to
>>>>>>> address
>>>>>>> right now is the MapReduce "legacy" codebase. A lot of the MR
>>>>>>> algorithms
>>>>>>> are currently unmaintained, documentation is outdated and the
>>>>>>> original
>>>>>>> authors have abandoned Mahout. For some algorithms it is hard to get
>>>>>>> even
>>>>>>> questions answered on the mailinglist (e.g. RandomForest). I agree
>>>>>>> with
>>>>>>> Sean's comments that letting the code linger around is no option and
>>>>>>> will
>>>>>>> continue to harm Mahout.
>>>>>>>
>>>>>>> In the previous discussion, I suggested to make a radical move and
>>>>>>> aim
>>>>>>> to
>>>>>>> delete this codebase, but there were serious objections from
>>>>>>> committers and
>>>>>>> users that convinced me that there is still usage of and interested
>>>>>>> in
>>>>>>> that
>>>>>>> codebase.
>>>>>>>
>>>>>>> That puts us into a "legacy dilemma". We cannot delete the code
>>>>>>> without
>>>>>>> harming our userbase. On the other hand, I don't see anyone willing
>>>>>>> to
>>>>>>> rework the codebase. Further, the code cannot linger around anymore
>>>>>>> as
>>>>>>> it
>>>>>>> is doing now, especially when we fail to answer questions or don't
>>>>>>> provide
>>>>>>> documentation.
>>>>>>>
>>>>>>> *We have to make a move*!
>>>>>>>
>>>>>>> I suggest the following actions with regard to the MR codebase. I
>>>>>>> hope
>>>>>>> that they find consent. If there are objections, please give
>>>>>>> alternatives,
>>>>>>> *keeping everything as-is is not an option*:
>>>>>>>
>>>>>>>    * reject any future MR algorithm contributions, prominently state
>>>>>>> this on
>>>>>>> the website and in talks
>>>>>>>    * make all existing algorithm code compatible with Hadoop 2, if
>>>>>>> there is
>>>>>>> no one willing to make an existing algorithm compatible, remove the
>>>>>>> algorithm
>>>>>>>    * deprecate the existing MR algorithms, yet still take bug fix
>>>>>>> contributions
>>>>>>>    * remove Random Forest as we cannot even answer questions to the
>>>>>>> implementation on the mailinglist
>>>>>>>
>>>>>>> There are two more actions that I would like to see, but'd be willing
>>>>>>> to
>>>>>>> give up if there are objections:
>>>>>>>
>>>>>>>    * move the MR algorithms into a separate maven module
>>>>>>>    * remove Frequent Pattern Mining again (we already aimed for that
>>>>>>> in
>>>>>>> 0.9
>>>>>>> but had one user who shouted but never returned to us)
>>>>>>>
>>>>>>> Let me know what you think.
>>>>>>>
>>>>>>> --sebastian
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>

Re: Tackling the "legacy dilemma"

Reply via email to