Re: What is Mahout?

Sean Owen Wed, 25 Feb 2015 12:22:16 -0800

My $0.02:

There is no shortage of algorithm libraries that are in some way
runnable on Hadoop out there, and not as much easy-to-use distributed
matrix operation libraries. I think it's more additive to the
ecosystem to solve that narrow, and deep, linear algebra problem and
really nail it. That's a pretty good 'identity' to claim. It seems
like an appropriate scope.


I do think the project has changed so much that it's more confusing to
keep calling it Mahout than to change the name. I can't think of one
person I've talked to about Mahout in the last 6 months that was not
under the impression that what is in 0.9 has simply been ported to
Spark. It's different enough that it could even be it's own incubator
project (under a different name).

The brand recognition is for the deprecated part so keeping that is
almost the problem. It's not crazy to just change the name. Or even
consider a re-incubation. It might give some latitude to more fully
reboot.

Releasing 1.0.0 on the other hand means committing to the APIs (and
name) for some fairly new code and fairly soon. Given that this is
sort of a 0.1 of a new project, going to 1.0 feels semantically wrong.
But a release would be good. Personally I'd suggest 0.10.

On Wed, Feb 25, 2015 at 5:50 PM, Pat Ferrel <p...@occamsmachete.com> wrote:
> Looking back over the last year Mahout has gone through a lot of changes. 
> Most users are still using the legacy mapreduce code and new users have 
> mostly looked elsewhere.
>
> The fact that people as knowledgable as former committers compare Mahout to 
> Oryx or MLlib seems odd to me because Mahout is neither a server nor a loose 
> collection of algorithms. It was the later until all of mapreduce was moved 
> to legacy and “no new mapreduce” was the rule.
>
> But what is it now? What is unique and of value? Is it destined to be late to 
> the party and chasing the algo checklists of things like MLlib?
>
> First a slight digression. I looked at moving itemsimilarity to raw Spark if 
> only to remove mrlegacy from the dependencies. At about the same time another 
> Mahouter asked the Spark list how to transpose a matrix. He got the answer 
> “why would you want to do that?” The fairly high performance algorithm behind 
> spark-itemsimilarity was designed by Sebastian and requires an optimized A’A, 
> A’B, A’C… and spark-rowsimilarity requires AA’. None of these are provided by 
> MLlib. No actual transpose is required so these two things should be seen as 
> separate comments about MLlib. The moral: unless I want to write optimized 
> matrix transpose-and-multiply solvers I will stick with Mahout.
>
> So back to Mahout’s unique value. Mahout today is a general linear algebra 
> lib and environment that performs optimized calculations on modern engines 
> like Spark. It is something like a Scala-fied R on Spark (or other engine).
>
> If this is true then spark-itemsimilarity can be seen as a package/add-on 
> that requires Mahout’s core Linear Algebra.
>
> Why use Mahout? Use it if you need scalable general linear algebra. That’s 
> not what MLlib does well.
>
> Should we be chasing MLlib’s algo list? Why would we? If we need some algo, 
> why not consume it directly from MLlib or somewhere else? Why is a 
> reimplementation important all else being equal?
>
> Is general scalable linear algebra sufficient for all important ML algos? 
> Certainly not. For instance streaming ones and in particular online updated 
> streaming algos may have little to gain from Mahout as it is today.
>
> If the above is true then Mahout is nothing like what it was in 0.9 and is 
> being unfairly compared to 0.9 and other things like that. This 
> misunderstanding of what Mahout _is_ leads to misapplied criticism and lack 
> of use for what it does well. At very least this all implies a very different 
> description on the CMS at most maybe something as drastic as a name change.
>
>

Re: What is Mahout?

Reply via email to