I was asked to answer an anonymous question about the future of Mahout on
Quora and thought I should share the answer here as well.

That really depends on where the community of users wants to take Mahout.

Some possibilities include:

a) better classifiers.  Mahout's capabilities in this respect include Naive
Bayes, Random Forest and logistic regression trained via single threaded
stochastic gradient descent (SGD).  It would be good to have a high quality
parallel implementation of SGD and it would be good to have some kind of
deep learning as well.  The random forest could also use some work.

b) faster horses.  I think that the sparse matrices can be made
significantly faster even considering the cost-based optimizer versions
that we already have.  The addition of JBLAS support for dense matrices
would also be interesting.

c) better API interfaces.  The clustering interfaces are a bit of a
shambles in spite of the cool capabilities available with streaming k-means
and friends.

d) better human interfaces.  It would be great to have products like
Dataiku drive Mahout capabilities.  Dataiku does a really great job of the
cleansing end of machine learning and Mahout really has not much in that
area.  It would also be nice to move forward with Dmitriy Lyubimov's work
on Scala bindings for Mahout.

e) bigger community.  There are some closely related communities like the
folks working on Spark with MLI.  More cross fertilization would be very
cool.

f) more data.  Getting sample data for testing is very hard.  Getting data
at scale is exceedingly hard.  If people could suggest a good, big and
freely available dataset, that would be awesome.

None of these possibilities matter, however, if somebody doesn't do them.
 So the question to each reader of this answer is "What would you like to
see and how can you help make that happen"?

Reply via email to