Amazon hosts some public data sets at http://aws.amazon.com/publicdatasets/ and 
http://aws.amazon.com/datasets 

> On Oct 5, 2013, at 1:11 PM, Ted Dunning <[email protected]> wrote:
> 
> I was asked to answer an anonymous question about the future of Mahout on
> Quora and thought I should share the answer here as well.
> 
> That really depends on where the community of users wants to take Mahout.
> 
> Some possibilities include:
> 
> a) better classifiers.  Mahout's capabilities in this respect include Naive
> Bayes, Random Forest and logistic regression trained via single threaded
> stochastic gradient descent (SGD).  It would be good to have a high quality
> parallel implementation of SGD and it would be good to have some kind of
> deep learning as well.  The random forest could also use some work.
> 
> b) faster horses.  I think that the sparse matrices can be made
> significantly faster even considering the cost-based optimizer versions
> that we already have.  The addition of JBLAS support for dense matrices
> would also be interesting.
> 
> c) better API interfaces.  The clustering interfaces are a bit of a
> shambles in spite of the cool capabilities available with streaming k-means
> and friends.
> 
> d) better human interfaces.  It would be great to have products like
> Dataiku drive Mahout capabilities.  Dataiku does a really great job of the
> cleansing end of machine learning and Mahout really has not much in that
> area.  It would also be nice to move forward with Dmitriy Lyubimov's work
> on Scala bindings for Mahout.
> 
> e) bigger community.  There are some closely related communities like the
> folks working on Spark with MLI.  More cross fertilization would be very
> cool.
> 
> f) more data.  Getting sample data for testing is very hard.  Getting data
> at scale is exceedingly hard.  If people could suggest a good, big and
> freely available dataset, that would be awesome.
> 
> None of these possibilities matter, however, if somebody doesn't do them.
> So the question to each reader of this answer is "What would you like to
> see and how can you help make that happen"?

Reply via email to