On 07.10.2011 01:59, Ted Dunning wrote: > On Thu, Oct 6, 2011 at 4:53 PM, Lance Norskog <[email protected]> wrote: > >> if/when Cloudera adds a Mahout version,
It's a when not an if :) > I do think that we need to make an effort here. I totally agree. Some things we could start with: I think Mahout is in a special situation as most people only work on the part of the code where they are familiar with the theoretical background. It's much harder to dive into different parts of the code than it would be in other software projects. I have the impression that this situation led to different "coding styles per module". I think we should address a kind of global refactoring and unification regarding this, a start could be the usage of AbstractJob across the project for example. It would also be great if we could introduce an annotation that indicates how mature, stable and production ready a particular algorithm implementation is. While I think it's important that new stuff is coming to Mahout, we also have to bear mind that we have very different user groups. Experimental implementations like the ParallelALSJob are very valuable for academic users for example (the Graphlab folks used it as a baseline in a recent paper e.g.). On the other hand I would discourage enterprise users from taking it in production as this job has some major open issues and it seems that the MapReduce paradigm is not a very good fit for that approach. I have a similar feeling towards the graph stuff, things like triangle enumeration for example don't work very well on Hadoop. But nevertheless it's a good thing to have implementations of stuff like PageRank or RandomWalkWithRestart even if these problems seem to demand another kind of underlying processing platform (maybe we'll support Giraph some day?). So I'd like to have an @Experimental annotation that indicates that an algorithm is not yet production ready and might be subject to lots of changes. > Besides, if they really care, it would be nice to have them show their faces > on the mailing list. To my knowledge one of Cloudera's engineers who worked on the mahout integration is already enlisted here. I'll ask him to show his face :) --sebastian
