On 07.10.2011 01:59, Ted Dunning wrote:
> On Thu, Oct 6, 2011 at 4:53 PM, Lance Norskog <[email protected]> wrote:
> 
>> if/when Cloudera adds a Mahout version,

It's a when not an if :)

> I do think that we need to make an effort here.  

I totally agree. Some things we could start with:

I think Mahout is in a special situation as most people only work on the
part of the code where they are familiar with the theoretical
background. It's much harder to dive into different parts of the code
than it would be in other software projects. I have the impression that
this situation led to different "coding styles per module". I think we
should address a kind of global refactoring and unification regarding
this, a start could be the usage of AbstractJob across the project for
example.


It would also be great if we could introduce an annotation that
indicates how mature, stable and production ready a particular algorithm
implementation is.

While I think it's important that new stuff is coming to Mahout, we also
have to bear mind that we have very different user groups.

Experimental implementations like the ParallelALSJob are very valuable
for academic users for example (the Graphlab folks used it as a baseline
in a recent paper e.g.). On the other hand I would discourage enterprise
users from taking it in production as this job has some major open
issues and it seems that the MapReduce paradigm is not a very good fit
for that approach.

I have a similar feeling towards the graph stuff, things like triangle
enumeration for example don't work very well on Hadoop. But nevertheless
it's a good thing to have implementations of stuff like PageRank or
RandomWalkWithRestart even if these problems seem to demand another kind
of underlying processing platform (maybe we'll support Giraph some day?).

So I'd like to have an @Experimental annotation that indicates that an
algorithm is not yet production ready and might be subject to lots of
changes.


> Besides, if they really care, it would be nice to have them show their faces
> on the mailing list.

To my knowledge one of Cloudera's engineers who worked on the mahout
integration is already enlisted here. I'll ask him to show his face :)

--sebastian

Reply via email to