How much would be involved in changing the name of a top-level project? I'd prefer to avoid the overhead of going back into incubation.
I agree 0.10 makes more sense. On Wed, Feb 25, 2015 at 12:16 PM, Sean Owen <sro...@gmail.com> wrote: > My $0.02: > > There is no shortage of algorithm libraries that are in some way > runnable on Hadoop out there, and not as much easy-to-use distributed > matrix operation libraries. I think it's more additive to the > ecosystem to solve that narrow, and deep, linear algebra problem and > really nail it. That's a pretty good 'identity' to claim. It seems > like an appropriate scope. > > I do think the project has changed so much that it's more confusing to > keep calling it Mahout than to change the name. I can't think of one > person I've talked to about Mahout in the last 6 months that was not > under the impression that what is in 0.9 has simply been ported to > Spark. It's different enough that it could even be it's own incubator > project (under a different name). > > The brand recognition is for the deprecated part so keeping that is > almost the problem. It's not crazy to just change the name. Or even > consider a re-incubation. It might give some latitude to more fully > reboot. > > Releasing 1.0.0 on the other hand means committing to the APIs (and > name) for some fairly new code and fairly soon. Given that this is > sort of a 0.1 of a new project, going to 1.0 feels semantically wrong. > But a release would be good. Personally I'd suggest 0.10. > > On Wed, Feb 25, 2015 at 5:50 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > > Looking back over the last year Mahout has gone through a lot of > changes. Most users are still using the legacy mapreduce code and new users > have mostly looked elsewhere. > > > > The fact that people as knowledgable as former committers compare Mahout > to Oryx or MLlib seems odd to me because Mahout is neither a server nor a > loose collection of algorithms. It was the later until all of mapreduce was > moved to legacy and “no new mapreduce” was the rule. > > > > But what is it now? What is unique and of value? Is it destined to be > late to the party and chasing the algo checklists of things like MLlib? > > > > First a slight digression. I looked at moving itemsimilarity to raw > Spark if only to remove mrlegacy from the dependencies. At about the same > time another Mahouter asked the Spark list how to transpose a matrix. He > got the answer “why would you want to do that?” The fairly high performance > algorithm behind spark-itemsimilarity was designed by Sebastian and > requires an optimized A’A, A’B, A’C… and spark-rowsimilarity requires AA’. > None of these are provided by MLlib. No actual transpose is required so > these two things should be seen as separate comments about MLlib. The > moral: unless I want to write optimized matrix transpose-and-multiply > solvers I will stick with Mahout. > > > > So back to Mahout’s unique value. Mahout today is a general linear > algebra lib and environment that performs optimized calculations on modern > engines like Spark. It is something like a Scala-fied R on Spark (or other > engine). > > > > If this is true then spark-itemsimilarity can be seen as a > package/add-on that requires Mahout’s core Linear Algebra. > > > > Why use Mahout? Use it if you need scalable general linear algebra. > That’s not what MLlib does well. > > > > Should we be chasing MLlib’s algo list? Why would we? If we need some > algo, why not consume it directly from MLlib or somewhere else? Why is a > reimplementation important all else being equal? > > > > Is general scalable linear algebra sufficient for all important ML > algos? Certainly not. For instance streaming ones and in particular online > updated streaming algos may have little to gain from Mahout as it is today. > > > > If the above is true then Mahout is nothing like what it was in 0.9 and > is being unfairly compared to 0.9 and other things like that. This > misunderstanding of what Mahout _is_ leads to misapplied criticism and lack > of use for what it does well. At very least this all implies a very > different description on the CMS at most maybe something as drastic as a > name change. > > > > >