I was asked to answer an anonymous question about the future of Mahout on Quora and thought I should share the answer here as well.
That really depends on where the community of users wants to take Mahout. Some possibilities include: a) better classifiers. Mahout's capabilities in this respect include Naive Bayes, Random Forest and logistic regression trained via single threaded stochastic gradient descent (SGD). It would be good to have a high quality parallel implementation of SGD and it would be good to have some kind of deep learning as well. The random forest could also use some work. b) faster horses. I think that the sparse matrices can be made significantly faster even considering the cost-based optimizer versions that we already have. The addition of JBLAS support for dense matrices would also be interesting. c) better API interfaces. The clustering interfaces are a bit of a shambles in spite of the cool capabilities available with streaming k-means and friends. d) better human interfaces. It would be great to have products like Dataiku drive Mahout capabilities. Dataiku does a really great job of the cleansing end of machine learning and Mahout really has not much in that area. It would also be nice to move forward with Dmitriy Lyubimov's work on Scala bindings for Mahout. e) bigger community. There are some closely related communities like the folks working on Spark with MLI. More cross fertilization would be very cool. f) more data. Getting sample data for testing is very hard. Getting data at scale is exceedingly hard. If people could suggest a good, big and freely available dataset, that would be awesome. None of these possibilities matter, however, if somebody doesn't do them. So the question to each reader of this answer is "What would you like to see and how can you help make that happen"?
