I want to express my opinions for the vision, too. I tried to capture those words from various discussions in the dev-list, and hope that most, of them support the common sense of excitement the new Mahout arouses
To me, the fundamental benefit of the shift that Mahout is undergoing is a better separation of the distributed execution engine, distributed data structures, matrix computations, and algorithms layers, which will allow the users/devs of Mahout with different roles focus on the relevant parts of the framework: 1. A machine learning scientist, independent from the underlying distributed execution engine, can utilize the matrix language and the decompositions to implement new algorithms (which implies that the current distributed mahout algorithms are to be rewritten in the matrix language) 2. A math-scala module contributor, for the benefit of higher level algorithms, can add new, or improve existing functions (the set of decompositions is an example) with optimization plans (such as if two matrices are partitioned in the same way, ...), where the concrete implementations of those optimizations are delegated to the distributed execution engine layer 3. A distributed execution engine author can add machine learning capabilities to her platform with i)concrete Matrix and Matrix I/O implementation ii)partitioning, checkpointing, broadcasting behaviors, iii)BLAS 4. A Mahout user with access to a cluster operated by a Mahout-supporting distributed execution engine can run machine learning algorithms implemented on top of the matrix language Best Gokhan On Tue, May 20, 2014 at 8:30 PM, Dmitriy Lyubimov <[email protected]> wrote: > inline > > > On Tue, May 20, 2014 at 12:42 AM, Sebastian Schelter <[email protected]> > wrote: > > > > >> > > Let's take the next from our homepage as starting point. What should we > > add/remove/modify? > > > > ------------------------------------------------------------ > > ---------------- > > The Mahout community decided to move its codebase onto modern data > > processing systems that offer a richer programming model and more > efficient > > execution than Hadoop MapReduce. Mahout will therefore reject new > MapReduce > > algorithm implementations from now on. We will however keep our widely > used > > MapReduce algorithms in the codebase and maintain them. > > > > We are building our future implementations on top of a > > Scala > > > DSL for linear algebraic operations which has been developed over the > last > > months. Programs written in this DSL are automatically optimized and > > executed in parallel for Apache Spark. > > More platforms to be added in the future. > > > > > Furthermore, there is an experimental contribution undergoing which aims > > to integrate the h20 platform into Mahout. > > ------------------------------------------------------------ > > ---------------- > > >
