Thanks for this Sebastian. I think the new direction is exciting but indeed we first should focus on what we all agree on.
+1 on renaming of core to mr-legacy and moving stuff out and deprecating some of the algorithms. I would like to help with this restructuring. Cheers, Frank On Tue, Apr 15, 2014 at 6:57 AM, Sebastian Schelter <[email protected]> wrote: > Hi, > > From reading the thread, I have the impression that we agree on the > following actions: > > > * reject any future MR algorithm contributions, prominently state this > on the website and in talks > * make all existing algorithm code compatible with Hadoop 2, if there is > no one willing to make an existing algorithm compatible, remove the > algorithm > * deprecate Canopy clustering > * email the original FPM and random forest authors to ask for maintenance > of the algorithms > * rename core to "mr-legacy" (and gradually pull items we really need > out of that later) > > I will create jira tickets for those action points. I think the biggest > challenge here is the Hadoop 2 compatibility, is someone volunteering to > drive that? Would be awesome. > > Best, > Sebastian > > > > On 04/13/2014 07:19 PM, Andrew Musselman wrote: > >> This is a good summary of how I feel too. >> >> On Apr 13, 2014, at 10:15 AM, Sebastian Schelter <[email protected]> wrote: >>> >>> Unfortunately, its not that easy to get enough voluntary work. I issued >>> the third call for working on the documentation today as there are still >>> lots of open issues. That's why I'm trying to suggest a move that involves >>> as few work as possible. >>> >>> We should get the MR codebase into a state that we all can live with and >>> then focus on new stuff like the scala DSL. >>> >>> --sebastian >>> >>> >>> >>> >>> On 04/13/2014 07:09 PM, Giorgio Zoppi wrote: >>>> The best thing, should be do a plan, and see how much effort do you >>>> need to >>>> this. Then find out voluntaries to accomplish the task. Quite sure that >>>> there a lot of people around there that they are willing to help out. >>>> >>>> BR, >>>> deneb. >>>> >>>> >>>> 2014-04-13 18:45 GMT+02:00 Sebastian Schelter <[email protected]>: >>>> >>>> Hi, >>>>> >>>>> I took some days to let the latest discussion about the state and >>>>> future >>>>> of Mahout go through my head. I think the most important thing to >>>>> address >>>>> right now is the MapReduce "legacy" codebase. A lot of the MR >>>>> algorithms >>>>> are currently unmaintained, documentation is outdated and the original >>>>> authors have abandoned Mahout. For some algorithms it is hard to get >>>>> even >>>>> questions answered on the mailinglist (e.g. RandomForest). I agree with >>>>> Sean's comments that letting the code linger around is no option and >>>>> will >>>>> continue to harm Mahout. >>>>> >>>>> In the previous discussion, I suggested to make a radical move and aim >>>>> to >>>>> delete this codebase, but there were serious objections from >>>>> committers and >>>>> users that convinced me that there is still usage of and interested in >>>>> that >>>>> codebase. >>>>> >>>>> That puts us into a "legacy dilemma". We cannot delete the code without >>>>> harming our userbase. On the other hand, I don't see anyone willing to >>>>> rework the codebase. Further, the code cannot linger around anymore as >>>>> it >>>>> is doing now, especially when we fail to answer questions or don't >>>>> provide >>>>> documentation. >>>>> >>>>> *We have to make a move*! >>>>> >>>>> I suggest the following actions with regard to the MR codebase. I hope >>>>> that they find consent. If there are objections, please give >>>>> alternatives, >>>>> *keeping everything as-is is not an option*: >>>>> >>>>> * reject any future MR algorithm contributions, prominently state >>>>> this on >>>>> the website and in talks >>>>> * make all existing algorithm code compatible with Hadoop 2, if >>>>> there is >>>>> no one willing to make an existing algorithm compatible, remove the >>>>> algorithm >>>>> * deprecate the existing MR algorithms, yet still take bug fix >>>>> contributions >>>>> * remove Random Forest as we cannot even answer questions to the >>>>> implementation on the mailinglist >>>>> >>>>> There are two more actions that I would like to see, but'd be willing >>>>> to >>>>> give up if there are objections: >>>>> >>>>> * move the MR algorithms into a separate maven module >>>>> * remove Frequent Pattern Mining again (we already aimed for that in >>>>> 0.9 >>>>> but had one user who shouted but never returned to us) >>>>> >>>>> Let me know what you think. >>>>> >>>>> --sebastian >>>>> >>>> >>> >
