Hi All,

Please see below the first draft of Release notes for Mahout 0.9. Please feel 
free to add/edit sections as u see fit.
(This is a draft only).

Regards,
Suneel


---------------------------------


The Apache Mahout PMC is pleased to announce the release of Mahout 0.9. 
Mahout's goal is to build scalable machine learning libraries focused 
primarily in the areas of collaborative filtering (recommenders), 
clustering and classification (known collectively as the "3Cs"), as well as the 
necessary infrastructure to support those implementations including, but
not limited to, math packages for statistics, linear algebra and others
as well as Java primitive collections, local and distributed vector and
matrix classes and a variety of integrative code to work with popular 
packages like Apache Hadoop, Apache Lucene, Apache HBase, Apache 
Cassandra and much more. The 0.9 release is mainly a clean up release in
preparation for an upcoming 1.0 release targeted for first half of 2014, but 
there are a few
significant new features, which are highlighted below.

To get started with Apache Mahout 0.9,
 download the release artifacts and signatures at 
http://www.apache.org/dyn/closer.cgi/mahout or visit the central Maven 
repository. 

In
 addition to the release highlights and artifacts, please pay attention 
to the section labelled FUTURE PLANS below for more information about 
upcoming releases of Mahout.

As with any release, we wish to thank all of the users and contributors 
to Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for 
individual credits, as there are too many to list here.

GETTING STARTED

In the release package, the examples directory contains several working 
examples of the core 
functionality available in Mahout. These can be run via scripts in the 
examples/bin
 directory and will prompt you for more information to help you try 
things out. Most examples do not need a Hadoop cluster in 
order to run.

RELEASE HIGHLIGHTS

The highlights of the Apache Mahout 0.9 release include, but are not 
limited to the list below. For further information, see the included 
CHANGELOG file.

- Scala DSL Bindings for Mahout Math Linear Algebra (MAHOUT-1297).
   See 
http://weatheringthrutechdays.blogspot.com/2013/07/scala-dsl-for-mahout-in-core-linear.html
- New Multilayer Perceptron Classifier (MAHOUT-1265) 
- Recommenders as a Search (MAHOUT-1288).  See 
https://github.com/pferrel/solr-recommender
- MAHOUT-1364: Upgrade Mahout to be Lucene 4.6.0 compliant
- MAHOUT-1361: Online Algorithm for computing accurate Quantiles using 
1-dimensional Clustering
  See 
https://github.com/tdunning/t-digest/blob/master/docs/theory/t-digest-paper/histo.pdf
 for the details.

- Removed Deprecated algorithms.

- the usual bug fixes. See JIRA [?} for more information on the 0.9 release.


A total 91 separate JIRA issues were addressed in this release.

The following algorithms that were marked deprecated in 0.8 have been removed 
in 0.9:

- From Clustering:
  Dirichlet - replaced by Collapsible Variational Bayes (CVB)

  Meanshift 

  MinHash - removed due to poor performance and lack of usage

  EigenCuts -


- From Classification (both are sequential implementations)

  Winnow - lack of actual usage

  Perceptron - lack of actual usage 


- Frequent Pattern Mining

- Collaborative Filtering
    All recommenders in org.apache.mahout.cf.taste.impl.recommender.knn
    SlopeOne implementations in org.apache.mahout.cf.taste.hadoop.slopeone and 
org.apache.mahout.cf.taste.impl.recommender.slopeone
    Distributed pseudo recommender in org.apache.mahout.cf.taste.hadoop.pseudo
    TreeClusteringRecommender in org.apache.mahout.cf.taste.impl.recommender

- Mahout Math
    Lanczos in favour of SSVD    
    Hadoop entropy stuff in org.apache.mahout.math.stats.entropy

If you are interested in supporting 1 or more of these algorithms, please make 
it known on [email protected] and via JIRA issues that fix and/or improve 
them. Please also provide 
supporting evidence as to their effectiveness for you in production.


CONTRIBUTING

Mahout
 is always looking for contributions focused on the 3Cs. If you are 
interested in contributing, please see our contribution page, 
https://cwiki.apache.org/MAHOUT/how-to-contribute.html, on the Mahout wiki or 
contact us via email at [email protected].

FUTURE PLANS

1.0 Plans
------------


- New Downpour SGD classifier 

- Support for Finite State Transducers (FST) as a Dictionary Type.
- Support for Hadoop 2.x
- Port Mahout's recommenders to Spark (??)
- Support for Java 7
- Better API interfaces for Clustering
- (what else???)


As the project moves towards a 1.0 release, the community will be focused on
key algorithms that are proven to scale in production 
and have seen wide-spread adoption.  

Our plans as a community are to focus 1.0 on the support of algorithms and 
features listed above.
The support for the algorithms packaged in 1.0 for atleast two minor versions 
after 1.0 release.
In the case of removal after 1.0, we will deprecate
the functionality in the 1.(x+1) minor release and remove
 it in the 
1.(x+2) release. For instance, if feature X is to be removed after the 
1.2 release, it will be deprecated in 1.3 and removed in 1.4.

[1] 
http://svn.apache.org/viewvc/mahout/trunk/CHANGELOG?revision=1552746&view=markup
[2] 
https://issues.apache.org/jira/browse/MAHOUT-1376?jql=project%20%3D%20MAHOUT%20AND%20fixVersion%20%3D%20%220.9%22

Reply via email to