Hi,
I took some days to let the latest discussion about the state and future
of Mahout go through my head. I think the most important thing to
address right now is the MapReduce "legacy" codebase. A lot of the MR
algorithms are currently unmaintained, documentation is outdated and the
original authors have abandoned Mahout. For some algorithms it is hard
to get even questions answered on the mailinglist (e.g. RandomForest). I
agree with Sean's comments that letting the code linger around is no
option and will continue to harm Mahout.
In the previous discussion, I suggested to make a radical move and aim
to delete this codebase, but there were serious objections from
committers and users that convinced me that there is still usage of and
interested in that codebase.
That puts us into a "legacy dilemma". We cannot delete the code without
harming our userbase. On the other hand, I don't see anyone willing to
rework the codebase. Further, the code cannot linger around anymore as
it is doing now, especially when we fail to answer questions or don't
provide documentation.
*We have to make a move*!
I suggest the following actions with regard to the MR codebase. I hope
that they find consent. If there are objections, please give
alternatives, *keeping everything as-is is not an option*:
* reject any future MR algorithm contributions, prominently state this
on the website and in talks
* make all existing algorithm code compatible with Hadoop 2, if there
is no one willing to make an existing algorithm compatible, remove the
algorithm
* deprecate the existing MR algorithms, yet still take bug fix
contributions
* remove Random Forest as we cannot even answer questions to the
implementation on the mailinglist
There are two more actions that I would like to see, but'd be willing to
give up if there are objections:
* move the MR algorithms into a separate maven module
* remove Frequent Pattern Mining again (we already aimed for that in
0.9 but had one user who shouted but never returned to us)
Let me know what you think.
--sebastian