Re: Mahout 1.0 goals

peng Sun, 02 Mar 2014 16:31:14 -0800

Hi Dr Dunning,

I'm reluctant to admit that my feeling is similar to many of Sean'scustomers. as a user of mahout and lucene-solr, I see a lot ofsimilarities in their cases:

lucene | mahout

indexing takes text as sparse vectors and build inverted index |training takes data as sparse vectors and build model

inverted index exist in memory/HDFS | model exist in memory/HDFS

use by input text and return match with scores | use by input test dataand return scores/labelsdo model selection by comparing ordinal number of scores with groundtruth | do model selection by comparing scores/labels with ground truth

Then lucene/solr/elasticsearch evolved to become most successfulflagship products (as buggy and incomplete as it is, it still gain wideusage which mahout never achieved). Yet mahout still looks like beingassembled by glue and duct tape. The major difficulties I encounteredare:

1. Components are not interchangable: e.g. the data and modelpresentation for single-node CF is vastly different from MR CF. Newfeature sometimes add backward-incompatible presentation. Thisdrastically demoralized user seeking to integrate with it and expectingimprovement.2. Components have strong dependency on others: e.g. Cross-validationof CF can only use in-memory DataModel, which SlopeOneRecommendercannot update properly (its removed but you got my point). Such designnever draw enough attention apart from an 'won't fix' solution.3. Many models can only be used internally, cannot be exported orreused in other applications. This is true in solr as well but itsrestful api is very universal and many etl tools has been built for it.In contrast mahout has a very hard learning curve for non-javadevelopers.

its not bad t see mahout as a service on top of a library, if itdoesn't take too much effort.


Yours Peng

On Sun 02 Mar 2014 11:45:33 AM EST, Ted Dunning wrote:

Ravi,

Good points.

On Sun, Mar 2, 2014 at 12:38 AM, Ravi Mummulla <ravi.mummu...@gmail.com>wrote:

- Natively support Windows (guidance, etc. No documentation exists today,
for instance)


There is a bit of demand for that.

- Faster time to first application (from discovery to first application

currently takes a non-trivial amount of effort; how can we lower the bar
and reduce the friction for adoption?)


There is huge evidence that this is important.

  - Better documenting use cases with working samples/examples
(Documentation
on https://mahout.apache.org/users/basics/algorithms.html is spread out
and
there is too much focus on algorithms as opposed to use cases - this is an
adoption blocker)


This is also important.

- Uniformity of the API set across all algorithms (are we providing the
same experience across all APIs?)


And many people have been tripped up by this.

  - Measuring/publishing scalability metrics of various algorithms (why
would
we want users to adopt Mahout vs. other frameworks for ML at scale?)


I don't see this as important as some of your other points, but is still
useful.

Re: Mahout 1.0 goals

Reply via email to