On Sep 5, 2009, at 9:41 AM, Sean Owen wrote:

To kind of wrap this up for now --

I hear some consensus that Mahout is about distributed, Hadoop-based
solutions for developers. So let's make sure we present a clean,
coherent API to developers wanting to run the project's Hadoop jobs.

I don't think we necessarily need to be distributed or Hadoop based, but those are what we led with so far and its a good start. The nice thing is the stuff works just fine in standalone mode, too. First and foremost, we are a machine learning project with a commercial friendly license and a solid community aiming to build fast, production ready libraries. Java, Hadoop and distributed are important, but secondary in my mind. There will certainly be some algorithms that we can't implement in Hadoop. See http://www.lucidimagination.com/search/document/ab7915e98d707194/thought_offering_ec2_s3_based_services .

+1 to coherent API, but that is always evolving, too.



I think we're a little bit stuck now as Hadoop 0.20.0 is a little bit
busted. But as it moves forward, perhaps I can volunteer to suggest
changes to unify the various jobs, mappers, reducers, etc. across the
project.


Cool.


Sean

On Fri, Sep 4, 2009 at 11:21 PM, Grant Ingersoll<gsing...@apache.org> wrote:

On Sep 4, 2009, at 1:07 PM, Ted Dunning wrote:

These are good questions to ask. I don't know that we are ready to answer
them, but I do think that we have pieces of the answers.

So far, there are three or four general themes that seem to be of real
interest/value

a) taste/collaborative filtering/cooccurrence analysis

b) facilitation of conventional machine learning by large scale
aggregation
using hadoop (so far, this is largely cooccurrence counting)

c) standard and basic machine learning tasks like clustering, simple
classifiers running on large scale data

d) stuff

I'd add a few non-technical things I find useful:

e)  Non-viral License

f) Community supporting it (i.e. not abandoned) and a place to get answers
about practical problems.

I've been frustrated more than once by the lack of (e) and (f) on some other projects. Not that I'm saying we solve (f) yet completely (could use a bit more diversity in people answering, but that is starting to take hold, too),
but I do firmly believe Apache is one of the best places to build a
community.

-Grant


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search

Reply via email to