Amazon hosts some public data sets at http://aws.amazon.com/publicdatasets/ and http://aws.amazon.com/datasets
> On Oct 5, 2013, at 1:11 PM, Ted Dunning <[email protected]> wrote: > > I was asked to answer an anonymous question about the future of Mahout on > Quora and thought I should share the answer here as well. > > That really depends on where the community of users wants to take Mahout. > > Some possibilities include: > > a) better classifiers. Mahout's capabilities in this respect include Naive > Bayes, Random Forest and logistic regression trained via single threaded > stochastic gradient descent (SGD). It would be good to have a high quality > parallel implementation of SGD and it would be good to have some kind of > deep learning as well. The random forest could also use some work. > > b) faster horses. I think that the sparse matrices can be made > significantly faster even considering the cost-based optimizer versions > that we already have. The addition of JBLAS support for dense matrices > would also be interesting. > > c) better API interfaces. The clustering interfaces are a bit of a > shambles in spite of the cool capabilities available with streaming k-means > and friends. > > d) better human interfaces. It would be great to have products like > Dataiku drive Mahout capabilities. Dataiku does a really great job of the > cleansing end of machine learning and Mahout really has not much in that > area. It would also be nice to move forward with Dmitriy Lyubimov's work > on Scala bindings for Mahout. > > e) bigger community. There are some closely related communities like the > folks working on Spark with MLI. More cross fertilization would be very > cool. > > f) more data. Getting sample data for testing is very hard. Getting data > at scale is exceedingly hard. If people could suggest a good, big and > freely available dataset, that would be awesome. > > None of these possibilities matter, however, if somebody doesn't do them. > So the question to each reader of this answer is "What would you like to > see and how can you help make that happen"?
