Author: smarthi
Date: Fri Apr  8 18:40:36 2016
New Revision: 1738281

URL: http://svn.apache.org/viewvc?rev=1738281&view=rev
Log:
MAHOUT-1779: Brief overview of Mahout Flink Engine

Added:
    mahout/site/mahout_cms/trunk/content/users/flinkbindings/
    
mahout/site/mahout_cms/trunk/content/users/flinkbindings/flink-internals.mdtext

Added: 
mahout/site/mahout_cms/trunk/content/users/flinkbindings/flink-internals.mdtext
URL: 
http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/flinkbindings/flink-internals.mdtext?rev=1738281&view=auto
==============================================================================
--- 
mahout/site/mahout_cms/trunk/content/users/flinkbindings/flink-internals.mdtext 
(added)
+++ 
mahout/site/mahout_cms/trunk/content/users/flinkbindings/flink-internals.mdtext 
Fri Apr  8 18:40:36 2016
@@ -0,0 +1,30 @@
+
+#Introduction
+
+This document provides an overview of how the Mahout Samsara environment is 
implemented over the Apache Flink backend engine. This document gives an 
overview of the code layout for the Flink backend engine, the source code for 
which can be found under /flink directory in the Mahout codebase.
+
+Apache Flink is a distributed big data streaming engine that supports both 
Streaming and Batch interfaces. Batch processing is an extension of Flink’s 
Stream processing engine.
+
+The Mahout Flink integration presently supports Flink’s batch processing 
capabilities leveraging the DataSet API.
+
+The Mahout DRM, or Distributed Row Matrix, is an abstraction for storing a 
large matrix of numbers in-memory in a cluster by distributing logical rows 
among servers. Mahout's scala DSL provides an abstract API on DRMs for backend 
engines to provide implementations of this API. An example is the Spark backend 
engine. Each engine has it's own design of mapping the abstract API onto its 
data model and provides implementations for algebraic operators over that 
mapping.
+
+#Flink Environment Engine
+
+The Flink backend implements the abstract DRM as a Flink DataSet. A Flink job 
runs in the context of an ExecutionEnvironment (from the Flink Batch processing 
API).
+
+#Source Layout
+
+Within mahout.git, the top level directory, flink/ holds all the source code 
for the Flink backend engine. Sections of code that interface with the rest of 
the Mahout components are in Scala, and sections of the code that interface 
with Flink DataSet API and implement algebraic operators are in Java. Here is a 
brief overview of what functionality can be found within flink/ folder.
+
+flink/ - top level directory containing all Flink related code
+
+flink/src/main/scala/org/apache/mahout/flinkbindings/blas/*.scala - Physical 
operator code for the Samsara DSL algebra
+
+flink/src/main/scala/org/apache/mahout/flinkbindings/drm/*.scala - Flink 
Dataset DRM and broadcast implementation
+
+flink/src/main/scala/org/apache/mahout/flinkbindings/io/*.scala - Read / Write 
between DRMDataSet and files on HDFS
+
+flink/src/main/scala/org/apache/mahout/flinkbindings/FlinkEngine.scala - DSL 
operator graph evaluator and various abstract API implementations for a 
distributed engine.
+
+


Reply via email to