=== Apache Mahout Status Report: March 2013 === Apache Mahout provides implementations of machine learning algorithms (collaborative filtering, clustering, classification, and more) for large-scale data, mostly via Hadoop-based implementations.
Issues: Sean Owen wishes to leave the Mahout PMC (but retain his commit rights), but this is the only issue which needs the Board attention. Current Activity: How has the community developed since the last report? In February: Originally planned for 0.8 release by March 8, but will be letting that slip forward a few weeks. Selection of Presentations, Articles and Outreach: * Ted Dunning on new fast streaming clustering (http://www.slideshare.net/tdunning/news-frommahout20130305) * Fast clustering at ACM http://www.slideshare.net/tdunning/acm-20130225 * Real time learning http://www.slideshare.net/tdunning/real-time-learning * MapR-Lucidworks on reflected intelligence http://www.slideshare.net/tdunning/mapr-lucidworks-joint-webinar * Ted Dunning at Strata on Mahout http://www.slideshare.net/tdunning/strata-newyork2012 * Ted Dunning on fast clustering at Oxford http://www.slideshare.net/tdunning/oxford-05oct2012 * MapR and Amex speak about large-scale analytics with Mahout http://www.slideshare.net/tdunning/customer-analysisatscalestrata10022012 * Overstock and Mahout http://www.wired.com/wiredenterprise/2012/12/mahout/ * Advanced Analytics in Mahout http://portfortune.wordpress.com/2012/12/05/advanced-analytics-in-hadoop-part-one * London Data Science http://datasciencelondon.org/tag/mahout/ * Mahout Updated in CDH 4.1 http://blog.cloudera.com/blog/2012/11/whats-new-in-cdh4-1-mahout/ Scientific publications based on Mahout * _Sebastian Schelter, Sean Owen: Collaborative Filtering with Apache Mahout_, Recommender Systems Challenge Workshop in conjunction with ACM RecSys 2012 pdf|http://ssc.io/wp-content/uploads/2013/02/cf-mahout.pdf * _Sebastian Schelter, Christoph Boden, Volker Markl: Scalable Similarity-Based Neighborhood Methods with MapReduce_, ACM Conference on Recommender Systems 2012, Dublin http://dl.acm.org/citation.cfm?id=2365984 http://ssc.io/wp-content/uploads/2012/06/rec11-schelter.pdf Code We were able to attract the developer of one of the leading scientific recommender libraries [http://mymedialite.net/] to port a few implementations to Mahout ([MAHOUT-1106|https://issues.apache.org/jira/browse/MAHOUT-1106], [MAHOUT-1089|https://issues.apache.org/jira/browse/MAHOUT-1089]) However, new code contributions have slowed to a crawl, the number of commits in the past few months, compared to prior years: Feb 2013, 7 Jan 2013, 20 Dec 2012, 7 Feb 2012, 98 Jan 2012, 27 Dec 2011, 99 Feb 2011, 35 Jan 2011, 52 Dec 2010, 37 Feb 2010, 207 Jan 2010, 132 Dec 2009, 135 New Commercial Integrations * Predixion Readmission Insight, a "a preventable readmission healthcare solution" announced [ http://www.virtual-strategy.com/2013/03/05/predixion-software-wins-microsoft-health-users-group-innovation-award ] integration with Mahout, Greenplumb, Hive, and Microsoft's BI stack. * Overstock and Mahout http://www.wired.com/wiredenterprise/2012/12/mahout New Open Source Integrations * The recommendation and advertisement network http://www.plista.com/en has built an open source weblayer for Mahout's recommenders https://github.com/plista/kornakapi * Mahout seems to be the framework of choice for PredictionIO http://prediction.io/, an open source prediction server for software developers to create predictive features, such as personalization, recommendation and content discovery Mailing List Summary: User list discussions are currently focussed primarily on bug reporting and helping new users, but very little about future feature work. Developer Mailing List Posting: [http://mail-archives.apache.org/mod_mbox/mahout-dev/] February 2013, 123 January 2013, 213 Dec 2012, 155 as compared to the same months in previous years: Feb 2012, 578 Jan 2012, 545 Dec 2011, 1079 and Feb 2011, 352 Jan 2011, 473 Dec 2010, 267 We've not had this low developer involvement since the first half of 2009. User Mailing List Posting [http://mail-archives.apache.org/mod_mbox/mahout-user/] User list discussions are primarily in support of very new users, as well as bug reporting on released versions (0.6 and sometimes even 0.5), highlighting the need for 0.8 to be released. While the traffic to the user mailing list has gone down slightly from previous years: Feb 2012, 288 Jan 2012, 367 Feb 2011, 359 Jan 2011, 458 Feb 2010, 497 Jan 2010, 272 This is not a dramatic decrease, as there is still considerable interest in the user community. Summary: How has the project developed since the last report: A 1.0 release is not yet on the horizon. == Milestones == 1.) Working towards a 0.8 release 2.) Development on new, faster clustering code -- -jake
