The Apache Mahout PMC is pleased to announce the release of Mahout 0.11.0.

Mahout's goal is to create an environment for quickly creating machine learning 
applications that scale and run on the highest performance parallel computation 
engines available. Mahout comprises an interactive environment and library that 
supports generalized scalable linear algebra and includes many modern machine 
learning algorithms.


The Mahout Math environment we call “Samsara” for its symbol of universal 
renewal. It reflects a fundamental rethinking of how scalable machine learning 
algorithms are built and customized. Mahout-Samsara is here to help people 
create their own math while providing some off-the-shelf algorithm 
implementations. At its base are general linear algebra and statistical 
operations along with the data structures to support them. It’s written in 
Scala with Mahout-specific extensions, and runs most fully on Spark.


To get started with Apache Mahout 0.11.0, download the release artifacts and 
signatures from http://www.apache.org/dist/mahout/0.11.0/.


Many thanks to the contributors and committers who were part of this release. 
Please see below for the Release Highlights.



RELEASE HIGHLIGHTS


This is a minor release over Mahout 0.10.0 meant to introduce several new 
features and to fix some bugs.  Mahout 0.11.0 includes all new features and 
bugfixes released in Mahout versions 0.10.1, and 0.10.2.



Mahout 0.11.0 new features compared to Mahout 0.10.0



  1.  Spark 1.3 support.

  2.  In-core transpose view rewrites. Modifiable transpose views eg. (for (col 
<- a.t) col := 5).

  3.  Performance and parallelization improvements for AB', A'B, A'A spark 
physical operators. This speeds SimilarityAnalysis and it’s associated jobs, 
spark-itemsimilarity and spark-rowsimilarity.

  4.  Optional structural "flavor" abstraction for in-core matrices.  In-core 
matrices can now be tagged as e.g. sparse or dense.

  5.  %*% optimization based on matrix flavors.

  6.  In-core ::= sparse assignment functions.

  7.  Assign := optimization (do proper traversal based on matrix flavors, 
similarly to %*%).

  8.  Adding in-place elementwise functional assignment (e.g. mxA := exp _, mxA 
::= exp _).

  9.  Distributed and in-core version of simple elementwise analogues of 
scala.math._. for example, for log(x) the convention is dlog(drm), mlog(mx), 
vlog(vec). Unfortunately we cannot overload these functions over what is done 
in scala.math, i.e. scala would not allow log(mx) or log(drm) and log(Double) 
at the same time, mainly because they are being defined in different packages.

  10. Distributed and in-core first and second moment routines. R analogs: 
mean(), colMeans(), rowMeans(), variance(), sd(). By convention, distributed 
versions are prepended by (d) letter: colMeanVars() colMeanStdevs() 
dcolMeanVars() dcolMeanStdevs().

  11. Distance and squared distance matrix routines. R analog: dist(). Provide 
both squared and non-squared eucledian distance matrices. By convention, 
distributed versions are prepended by (d) letter: dist(x), sqDist(x), 
dsqDist(x). Also a variation for pair-wise distance matrix of two different 
inputs x and y: sqDist(x,y), dsqDist(x,y).

  12. DRM row sampling api.

  13. Distributed performance bug fixes. This relates mostly to (a) matrix 
multiplication deficiencies, and (b) handling parallelism.

  14. Distributed engine neutral allreduceBlock() operator api for Spark and 
H2O.

  15. Distributed optimizer operators for elementwise functions. Rewrites 
recognizing e.g. 1+ drmX * dexp(drmX) as a single fused elementwise physical 
operator: elementwiseFunc(f1(f2(drmX)) where f1 = 1 + x and f2 = exp(x).

  16. More cbind, rbind flavors (e.g. 1 cbind mxX, 1 cbind drmX or the other 
way around) for Spark and H2O.

  17. Added +=: and *=: operators on vectors.

  18. Closeable API for broadcast tensors.

  19. Support for conversion of any type-keyed DRM into ordinally-keyed DRM.

  20. Scala logging style.

  21. rowSumsMap() summary for non-int-keyed DRMs.

  22. elementwise power operator ^ .

  23. R-like vector concatenation operator.

  24. In-core functional assignments e.g.: mxA :={ (x) => x * x}.

  25. Straighten out behavior of Matrix.iterator() and iterateNonEmpty().

  26. New mutable transposition view for in-core matrices.  In-core matrix 
transpose view. rewrite with mostly two goals in mind: (1) enable mutability, 
e.g. for (col <- mxA.t) col := k (2) translate matrix structural flavor for 
optimizers correctly. e.g. new SparseRowMatrix.t carries on as column-major 
structure.

  27. Native support for kryo serialization of tensor types.

  28. Deprecation of MultiLayerPerceptron, ConcatenateVectorsJob and all 
related classes.

  29. Deprecation of SparseColumnMatrix.

  30. Fixes for a major memory usage bug in co-occurrence analysis used by the 
driver spark-itemsimilarity. This will now require far less memory in the 
executor.

  31. Some minor fixes to Mahout-Samsara QR Decomposition and matrix ops.

  32. Trim down packages size to < 200MB.



Note: Mahout 0.11.0 artifacts seem to be binary compatible with Spark 1.4.



STATS


A total of 48 separate JIRA issues are addressed in this release [2] with 7 
bugfixes.



GETTING STARTED


Download the release artifacts and signatures at 
http://www.apache.org/dist/mahout/0.11.0/ The examples directory contains 
several working examples of the core functionality available in Mahout. These 
can be run via scripts in the examples/bin directory. Most examples do not need 
a Hadoop cluster in order to run.



FUTURE PLANS


Integration with Apache Flink is in the works in collaboration with TU Berlin 
and Data Artisans to add Flink as the 3rd execution engine to Mahout. This 
would be in addition to existing Apache Spark and H2O engines.


To see progress on this branch look here: 
https://github.com/apache/mahout/commits/master.



KNOWN ISSUES


In the non-source zip or tar, the example data for 
mahout/examples/bin/run-item-sim is missing. To run it get the csv files from 
Github<https://github.com/apache/mahout/tree/mahout-0.10.x/examples/src/main/resources>[4].



CREDITS


As with any release, we wish to thank all of the users and contributors to 
Mahout. Please see the CHANGELOG [1] and JIRA Release Notes [2] for individual 
credits, as there are too many to list here.



[1] https://github.com/apache/mahout/blob/master/CHANGELOG

[2] 
https://issues.apache.org/jira/browse/MAHOUT-1757?jql=project%20%3D%20MAHOUT%20AND%20status%20in%20%28Resolved%2C%20closed%29%20AND%20%28fixVersion%20%3D%200.10.1%20OR%20fixVersion%20%3D%200.10.2%20OR%20fixVersion%20%3D%200.11.0%29

[3] http://mahout.apache.org/developers/how-to-contribute.html

[4] https://github.com/apache/mahout/tree/master/examples/src/main/resources









Reply via email to