Author: smarthi
Date: Fri Apr 8 19:10:12 2016
New Revision: 1738287
URL: http://svn.apache.org/viewvc?rev=1738287&view=rev
Log:
CMS commit to mahout by smarthi
Modified:
mahout/site/mahout_cms/trunk/content/users/flinkbindings/flink-internals.mdtext
Modified:
mahout/site/mahout_cms/trunk/content/users/flinkbindings/flink-internals.mdtext
URL:
http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/flinkbindings/flink-internals.mdtext?rev=1738287&r1=1738286&r2=1738287&view=diff
==============================================================================
---
mahout/site/mahout_cms/trunk/content/users/flinkbindings/flink-internals.mdtext
(original)
+++
mahout/site/mahout_cms/trunk/content/users/flinkbindings/flink-internals.mdtext
Fri Apr 8 19:10:12 2016
@@ -1,4 +1,3 @@
-
#Introduction
This document provides an overview of how the Mahout Samsara environment is
implemented over the Apache Flink backend engine. This document gives an
overview of the code layout for the Flink backend engine, the source code for
which can be found under /flink directory in the Mahout codebase.
@@ -9,6 +8,20 @@ The Mahout Flink integration presently s
The Mahout DRM, or Distributed Row Matrix, is an abstraction for storing a
large matrix of numbers in-memory in a cluster by distributing logical rows
among servers. Mahout's scala DSL provides an abstract API on DRMs for backend
engines to provide implementations of this API. An example is the Spark backend
engine. Each engine has it's own design of mapping the abstract API onto its
data model and provides implementations for algebraic operators over that
mapping.
+#Flink Overview
+
+Apache Flink is an open source, distributed Stream and Batch Processing
Framework. At it's core, Flink is a Stream Processing engine and Batch
processing is an extension of Stream Processing.
+
+Flink includes several APIs for building applications with the Flink Engine:
+
+ <ol>
+<li><b>DataSet API</b> for Batch data in Java, Scala and Python</li>
+<li><b>DataStream API</b> for Stream Processing in Java and Scala</li>
+<li><b>Table API</b> with SQL-like regular expression language in Java and
Scala</li>
+<li><b>Gelly</b> Graph Processing API in Java and Scala</li>
+<li><b>CEP API</b>, a complex event processing library</li>
+<li><b>FlinkML</b>, a Machine Learning library</li>
+</ol>
#Flink Environment Engine
The Flink backend implements the abstract DRM as a Flink DataSet. A Flink job
runs in the context of an ExecutionEnvironment (from the Flink Batch processing
API).