Re: Tackling the "legacy dilemma"

Suneel Marthi Sun, 13 Apr 2014 10:16:14 -0700

I meant to deprecate first (and eventually remove) Canopy clustering. This
is in line with the conversation I had with Ted and Frank at AMS about
weaning users away from the old style Canopy->KMeans clustering to start
using Streaming KMeans. No point in keeping Canopy once users switch to
using Streaming KMeans.



On Sun, Apr 13, 2014 at 1:12 PM, Sebastian Schelter <[email protected]> wrote:

> Do you mean deprecating or removing Canopy clustering? I suggest to
> deprecate all MR code anyways.
>
> --sebastian
>
>
>
> On 04/13/2014 07:11 PM, Suneel Marthi wrote:
>
>  If I may add deprecating Canopy clustering to the list once we get
>> Streaming KMeans working right.
>>
>> On Sun, Apr 13, 2014 at 12:45 PM, Sebastian Schelter <[email protected]>
>> wrote:
>>
>>  Hi,
>>>
>>> I took some days to let the latest discussion about the state and future
>>> of Mahout go through my head. I think the most important thing to address
>>> right now is the MapReduce "legacy" codebase. A lot of the MR algorithms
>>> are currently unmaintained, documentation is outdated and the original
>>> authors have abandoned Mahout. For some algorithms it is hard to get even
>>> questions answered on the mailinglist (e.g. RandomForest). I agree with
>>> Sean's comments that letting the code linger around is no option and will
>>> continue to harm Mahout.
>>>
>>> In the previous discussion, I suggested to make a radical move and aim to
>>> delete this codebase, but there were serious objections from committers
>>> and
>>> users that convinced me that there is still usage of and interested in
>>> that
>>> codebase.
>>>
>>> That puts us into a "legacy dilemma". We cannot delete the code without
>>> harming our userbase. On the other hand, I don't see anyone willing to
>>> rework the codebase. Further, the code cannot linger around anymore as it
>>> is doing now, especially when we fail to answer questions or don't
>>> provide
>>> documentation.
>>>
>>> *We have to make a move*!
>>>
>>> I suggest the following actions with regard to the MR codebase. I hope
>>> that they find consent. If there are objections, please give
>>> alternatives,
>>> *keeping everything as-is is not an option*:
>>>
>>>   * reject any future MR algorithm contributions, prominently state this
>>> on
>>> the website and in talks
>>>
>>>       +1, this includes the new Frequent Pattern mining impl which is MR
>> based that was provided as a patch few months ago
>>
>>    * make all existing algorithm code compatible with Hadoop 2, if there
>>> is
>>> no one willing to make an existing algorithm compatible, remove the
>>> algorithm
>>>
>>>        +1. One of the questions I got asked when 0.9 was released was
>> 'when
>> is Mahout gonna be compatible with Yarn and Hadoop 2'?  We should target
>> that for the next major//interim release.
>>
>>    * deprecate the existing MR algorithms, yet still take bug fix
>>> contributions
>>>
>>>        I guess we'll be removing these in some future release, until
>> then we
>> keep absorbing bug fixes ??
>>
>>
>>    * remove Random Forest as we cannot even answer questions to the
>>> implementation on the mailinglist
>>>
>>>        +1 to removing present Random Forests. Andy Twigg had provided a
>> Spark
>> based Streaming Random Forests impl sometime last year. Its time to
>> restart
>> that conversation and integrate that into the codebase if the contributor
>> is still willing i.e.
>>
>>
>>> There are two more actions that I would like to see, but'd be willing to
>>> give up if there are objections:
>>>
>>>   * move the MR algorithms into a separate maven module
>>>
>>>         +1
>>
>>    * remove Frequent Pattern Mining again (we already aimed for that in
>>> 0.9
>>> but had one user who shouted but never returned to us)
>>>
>>>        This thing annoys me the most. We had removed this from 0.9 but
>> yet
>> restored it only because some user wanted it and promised to support it.
>> We
>> have not heard from the user again.
>>        Its got old MR code that we don't support anymore and this should
>> be
>> purged ASAP.
>>
>>
>>
>>  Let me know what you think.
>>>
>>> --sebastian
>>>
>>>
>>
>

Re: Tackling the "legacy dilemma"

Reply via email to