Re: Any plans for new clustering algorithms?

Sean Owen Mon, 21 Apr 2014 10:24:06 -0700

On Mon, Apr 21, 2014 at 6:03 PM, Paul Brown <p...@mult.ifario.us> wrote:
> - MLlib as Mahout.next would be a unfortunate.  There are some gems in
> Mahout, but there are also lots of rocks.  Setting a minimal bar of
> working, correctly implemented, and documented requires a surprising amount
> of work.


As someone with first-hand knowledge, this is correct. To Sang's
question, I can't see value in 'porting' Mahout since it is based on a
quite different paradigm. About the only part that translates is the
algorithm concept itself.

This is also the cautionary tale. The contents of the project have
ended up being a number of "drive-by" contributions of implementations
that, while individually perhaps brilliant (perhaps), didn't
necessarily match any other implementation in structure, input/output,
libraries used. The implementations were often a touch academic. The
result was hard to document, maintain, evolve or use.

Far more of the structure of the MLlib implementations are consistent
by virtue of being built around Spark core already. That's great.

One can't wait to completely build the foundation before building any
implementations. To me, the existing implementations are almost
exactly the basics I would choose. They cover the bases and will
exercise the abstractions and structure. So that's also great IMHO.

Re: Any plans for new clustering algorithms?

Reply via email to