=== Apache Mahout Status Report: March 2013 ===

Apache Mahout provides implementations of machine learning algorithms
(collaborative filtering, clustering, classification, and
more) for large-scale data, mostly via Hadoop-based
implementations.

Issues:

Sean Owen wishes to leave the Mahout PMC (but retain his commit rights),
but this is the only issue which needs the Board attention.

Current Activity: How has the community developed since the last
report? In February:

Originally planned for 0.8 release by March 8, but will be letting that
slip forward a few weeks.

Selection of Presentations, Articles and Outreach:

* Ted Dunning on new fast streaming clustering
(http://www.slideshare.net/tdunning/news-frommahout20130305)
* Fast clustering at ACM http://www.slideshare.net/tdunning/acm-20130225
* Real time learning http://www.slideshare.net/tdunning/real-time-learning
* MapR-Lucidworks on reflected intelligence
http://www.slideshare.net/tdunning/mapr-lucidworks-joint-webinar
* Ted Dunning at Strata on Mahout
http://www.slideshare.net/tdunning/strata-newyork2012
* Ted Dunning on fast clustering at Oxford
http://www.slideshare.net/tdunning/oxford-05oct2012
* MapR and Amex speak about large-scale analytics with Mahout
http://www.slideshare.net/tdunning/customer-analysisatscalestrata10022012
* Overstock and Mahout
http://www.wired.com/wiredenterprise/2012/12/mahout/
* Advanced Analytics in Mahout
http://portfortune.wordpress.com/2012/12/05/advanced-analytics-in-hadoop-part-one
* London Data Science http://datasciencelondon.org/tag/mahout/
* Mahout Updated in CDH 4.1
http://blog.cloudera.com/blog/2012/11/whats-new-in-cdh4-1-mahout/

Scientific publications based on Mahout

* _Sebastian Schelter, Sean Owen:
Collaborative Filtering with Apache Mahout_,
Recommender Systems Challenge Workshop in conjunction with ACM RecSys 2012
pdf|http://ssc.io/wp-content/uploads/2013/02/cf-mahout.pdf
* _Sebastian Schelter, Christoph Boden, Volker Markl: Scalable
Similarity-Based Neighborhood Methods with MapReduce_,
ACM Conference on Recommender Systems 2012, Dublin
http://dl.acm.org/citation.cfm?id=2365984
http://ssc.io/wp-content/uploads/2012/06/rec11-schelter.pdf


Code

We were able to attract the developer of one of the leading scientific
recommender libraries [http://mymedialite.net/] to port a few
implementations to Mahout
([MAHOUT-1106|https://issues.apache.org/jira/browse/MAHOUT-1106],
 [MAHOUT-1089|https://issues.apache.org/jira/browse/MAHOUT-1089])

However, new code contributions have slowed to a crawl, the number of
commits in the past few months, compared to prior years:

Feb 2013, 7
Jan 2013, 20
Dec 2012, 7

Feb 2012, 98
Jan 2012, 27
Dec 2011, 99

Feb 2011, 35
Jan 2011, 52
Dec 2010, 37

Feb 2010, 207
Jan 2010, 132
Dec 2009, 135

New Commercial Integrations

* Predixion Readmission Insight, a "a preventable readmission healthcare
solution" announced
[
http://www.virtual-strategy.com/2013/03/05/predixion-software-wins-microsoft-health-users-group-innovation-award
]
integration with Mahout, Greenplumb, Hive, and Microsoft's BI stack.
* Overstock and Mahout http://www.wired.com/wiredenterprise/2012/12/mahout

New Open Source Integrations

* The recommendation and advertisement network http://www.plista.com/en
has built an open source weblayer for Mahout's recommenders
https://github.com/plista/kornakapi
* Mahout seems to be the framework of choice for PredictionIO
http://prediction.io/, an open source prediction server for software
developers to create predictive features, such as personalization,
recommendation and content discovery

Mailing List Summary:

User list discussions are currently focussed primarily on bug reporting
and helping new users, but very little about future feature work.

Developer Mailing List Posting:

[http://mail-archives.apache.org/mod_mbox/mahout-dev/]
February 2013, 123
January 2013, 213
Dec 2012, 155

as compared to the same months in previous years:
Feb 2012, 578
Jan 2012, 545
Dec 2011, 1079

and

Feb 2011, 352
Jan 2011, 473
Dec 2010, 267

We've not had this low developer involvement since the first half of 2009.

User Mailing List Posting

[http://mail-archives.apache.org/mod_mbox/mahout-user/]
User list discussions are primarily in support of very new users, as well
as bug reporting on released versions (0.6 and sometimes even 0.5),
highlighting the need for 0.8 to be released.

While the traffic to the user mailing list has gone down slightly from
previous years:

Feb 2012, 288
Jan 2012, 367

Feb 2011, 359
Jan 2011, 458

Feb 2010, 497
Jan 2010, 272

This is not a dramatic decrease, as there is still considerable
interest in the user community.

Summary: How has the project developed since the last report:

A 1.0 release is not yet on the horizon.

== Milestones ==
1.) Working towards a 0.8 release
2.) Development on new, faster clustering code


-- 

  -jake

Reply via email to