This is a good summary of how I feel too.
> On Apr 13, 2014, at 10:15 AM, Sebastian Schelter <[email protected]> wrote: > > Unfortunately, its not that easy to get enough voluntary work. I issued the > third call for working on the documentation today as there are still lots of > open issues. That's why I'm trying to suggest a move that involves as few > work as possible. > > We should get the MR codebase into a state that we all can live with and then > focus on new stuff like the scala DSL. > > --sebastian > > > > >> On 04/13/2014 07:09 PM, Giorgio Zoppi wrote: >> The best thing, should be do a plan, and see how much effort do you need to >> this. Then find out voluntaries to accomplish the task. Quite sure that >> there a lot of people around there that they are willing to help out. >> >> BR, >> deneb. >> >> >> 2014-04-13 18:45 GMT+02:00 Sebastian Schelter <[email protected]>: >> >>> Hi, >>> >>> I took some days to let the latest discussion about the state and future >>> of Mahout go through my head. I think the most important thing to address >>> right now is the MapReduce "legacy" codebase. A lot of the MR algorithms >>> are currently unmaintained, documentation is outdated and the original >>> authors have abandoned Mahout. For some algorithms it is hard to get even >>> questions answered on the mailinglist (e.g. RandomForest). I agree with >>> Sean's comments that letting the code linger around is no option and will >>> continue to harm Mahout. >>> >>> In the previous discussion, I suggested to make a radical move and aim to >>> delete this codebase, but there were serious objections from committers and >>> users that convinced me that there is still usage of and interested in that >>> codebase. >>> >>> That puts us into a "legacy dilemma". We cannot delete the code without >>> harming our userbase. On the other hand, I don't see anyone willing to >>> rework the codebase. Further, the code cannot linger around anymore as it >>> is doing now, especially when we fail to answer questions or don't provide >>> documentation. >>> >>> *We have to make a move*! >>> >>> I suggest the following actions with regard to the MR codebase. I hope >>> that they find consent. If there are objections, please give alternatives, >>> *keeping everything as-is is not an option*: >>> >>> * reject any future MR algorithm contributions, prominently state this on >>> the website and in talks >>> * make all existing algorithm code compatible with Hadoop 2, if there is >>> no one willing to make an existing algorithm compatible, remove the >>> algorithm >>> * deprecate the existing MR algorithms, yet still take bug fix >>> contributions >>> * remove Random Forest as we cannot even answer questions to the >>> implementation on the mailinglist >>> >>> There are two more actions that I would like to see, but'd be willing to >>> give up if there are objections: >>> >>> * move the MR algorithms into a separate maven module >>> * remove Frequent Pattern Mining again (we already aimed for that in 0.9 >>> but had one user who shouted but never returned to us) >>> >>> Let me know what you think. >>> >>> --sebastian >
