Ok - that makes sense. Thanks.
On Wed, Apr 16, 2014 at 8:29 AM, Suneel Marthi <[email protected]> wrote: > The plan is to replace the existing Random Forests impl with a spark based > Streaming Random Forests. > As ssc had already mentioned the plan is not to entertain any new MR impls > but accept bug fixes for existing ones. > > > The consensus is to do away with existing MapReduce RF once the Spark > based Streaming Random Forests is in place. > > > On Tue, Apr 15, 2014 at 10:51 PM, Manoj Awasthi > <[email protected]>wrote: > >> >> > * remove Random Forest as we cannot even answer questions to the >> > implementation on the mailinglist >> > >> -1 to removing present Random Forests. I think it is being used - we >> (at adobe) are playing around with it a bit. If the reason for removal is >> that there no active maintainer that can be resolved by people using it >> getting more active on this - a community action. FWIW, I vote against >> throwing away this code. >> >> >> >> On Tue, Apr 15, 2014 at 2:38 PM, Sebastian Schelter <[email protected]>wrote: >> >>> On 04/15/2014 11:07 AM, Suneel Marthi wrote: >>> >>>> On Tue, Apr 15, 2014 at 12:57 AM, Sebastian Schelter <[email protected]> >>>> wrote: >>>> >>>> Hi, >>>>> >>>>> From reading the thread, I have the impression that we agree on the >>>>> following actions: >>>>> >>>>> >>>>> * reject any future MR algorithm contributions, prominently state >>>>> this >>>>> on the website and in talks >>>>> * make all existing algorithm code compatible with Hadoop 2, if >>>>> there is >>>>> no one willing to make an existing algorithm compatible, remove the >>>>> algorithm >>>>> * deprecate Canopy clustering >>>>> * email the original FPM and random forest authors to ask for >>>>> maintenance >>>>> of the algorithms >>>>> * rename core to "mr-legacy" (and gradually pull items we really >>>>> need >>>>> out of that later) >>>>> >>>>> I will create jira tickets for those action points. I think the biggest >>>>> challenge here is the Hadoop 2 compatibility, is someone volunteering >>>>> to >>>>> drive that? Would be awesome. >>>>> >>>>> >>>> With things settling down at work for me, I have time now to dedicate >>>> back >>>> to Mahout. I can drive this effort. >>>> >>> >>> That is great news! >>> >>> >>> >>>> >>>>> Best, >>>>> Sebastian >>>>> >>>>> >>>>> On 04/13/2014 07:19 PM, Andrew Musselman wrote: >>>>> >>>>> This is a good summary of how I feel too. >>>>>> >>>>>> On Apr 13, 2014, at 10:15 AM, Sebastian Schelter <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> Unfortunately, its not that easy to get enough voluntary work. I >>>>>>> issued >>>>>>> the third call for working on the documentation today as there are >>>>>>> still >>>>>>> lots of open issues. That's why I'm trying to suggest a move that >>>>>>> involves >>>>>>> as few work as possible. >>>>>>> >>>>>>> We should get the MR codebase into a state that we all can live with >>>>>>> and >>>>>>> then focus on new stuff like the scala DSL. >>>>>>> >>>>>>> --sebastian >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 04/13/2014 07:09 PM, Giorgio Zoppi wrote: >>>>>>> >>>>>>>> The best thing, should be do a plan, and see how much effort do you >>>>>>>> need to >>>>>>>> this. Then find out voluntaries to accomplish the task. Quite sure >>>>>>>> that >>>>>>>> there a lot of people around there that they are willing to help >>>>>>>> out. >>>>>>>> >>>>>>>> BR, >>>>>>>> deneb. >>>>>>>> >>>>>>>> >>>>>>>> 2014-04-13 18:45 GMT+02:00 Sebastian Schelter <[email protected]>: >>>>>>>> >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>>> >>>>>>>>> I took some days to let the latest discussion about the state and >>>>>>>>> future >>>>>>>>> of Mahout go through my head. I think the most important thing to >>>>>>>>> address >>>>>>>>> right now is the MapReduce "legacy" codebase. A lot of the MR >>>>>>>>> algorithms >>>>>>>>> are currently unmaintained, documentation is outdated and the >>>>>>>>> original >>>>>>>>> authors have abandoned Mahout. For some algorithms it is hard to >>>>>>>>> get >>>>>>>>> even >>>>>>>>> questions answered on the mailinglist (e.g. RandomForest). I agree >>>>>>>>> with >>>>>>>>> Sean's comments that letting the code linger around is no option >>>>>>>>> and >>>>>>>>> will >>>>>>>>> continue to harm Mahout. >>>>>>>>> >>>>>>>>> In the previous discussion, I suggested to make a radical move and >>>>>>>>> aim >>>>>>>>> to >>>>>>>>> delete this codebase, but there were serious objections from >>>>>>>>> committers and >>>>>>>>> users that convinced me that there is still usage of and >>>>>>>>> interested in >>>>>>>>> that >>>>>>>>> codebase. >>>>>>>>> >>>>>>>>> That puts us into a "legacy dilemma". We cannot delete the code >>>>>>>>> without >>>>>>>>> harming our userbase. On the other hand, I don't see anyone >>>>>>>>> willing to >>>>>>>>> rework the codebase. Further, the code cannot linger around >>>>>>>>> anymore as >>>>>>>>> it >>>>>>>>> is doing now, especially when we fail to answer questions or don't >>>>>>>>> provide >>>>>>>>> documentation. >>>>>>>>> >>>>>>>>> *We have to make a move*! >>>>>>>>> >>>>>>>>> I suggest the following actions with regard to the MR codebase. I >>>>>>>>> hope >>>>>>>>> that they find consent. If there are objections, please give >>>>>>>>> alternatives, >>>>>>>>> *keeping everything as-is is not an option*: >>>>>>>>> >>>>>>>>> * reject any future MR algorithm contributions, prominently >>>>>>>>> state >>>>>>>>> this on >>>>>>>>> the website and in talks >>>>>>>>> * make all existing algorithm code compatible with Hadoop 2, if >>>>>>>>> there is >>>>>>>>> no one willing to make an existing algorithm compatible, remove the >>>>>>>>> algorithm >>>>>>>>> * deprecate the existing MR algorithms, yet still take bug fix >>>>>>>>> contributions >>>>>>>>> * remove Random Forest as we cannot even answer questions to the >>>>>>>>> implementation on the mailinglist >>>>>>>>> >>>>>>>>> There are two more actions that I would like to see, but'd be >>>>>>>>> willing >>>>>>>>> to >>>>>>>>> give up if there are objections: >>>>>>>>> >>>>>>>>> * move the MR algorithms into a separate maven module >>>>>>>>> * remove Frequent Pattern Mining again (we already aimed for >>>>>>>>> that in >>>>>>>>> 0.9 >>>>>>>>> but had one user who shouted but never returned to us) >>>>>>>>> >>>>>>>>> Let me know what you think. >>>>>>>>> >>>>>>>>> --sebastian >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >>> >> >
