[BEAM-506] Fill in the documentation/runners/flink portion of the website

Project: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/repo
Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/tree/ac0c4e06
Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/diff/ac0c4e06

Branch: refs/heads/asf-site
Commit: ac0c4e063459ca251354b94eed866c0934548fec
Parents: 1b458f1
Author: Aljoscha Krettek <aljoscha.kret...@gmail.com>
Authored: Tue Nov 29 16:23:03 2016 +0100
Committer: Dan Halperin <dhalp...@google.com>
Committed: Thu Dec 1 14:30:22 2016 -0800

 src/documentation/runners/flink.md | 136 +++++++++++++++++++++++++++++++-
 1 file changed, 135 insertions(+), 1 deletion(-)

diff --git a/src/documentation/runners/flink.md 
index 4145be6..a984bb4 100644
--- a/src/documentation/runners/flink.md
+++ b/src/documentation/runners/flink.md
@@ -6,4 +6,138 @@ redirect_from: /learn/runners/flink/
 # Using the Apache Flink Runner
-This page is under construction 
+The Apache Flink Runner can be used to execute Beam pipelines using [Apache 
Flink](https://flink.apache.org). When using the Flink Runner you will create a 
jar file containing your job that can be executed on a regular Flink cluster. 
It's also possible to execute a Beam pipeline using Flink's local execution 
mode without setting up a cluster. This is helpful for development and 
debugging of your pipeline.
+The Flink Runner and Flink are suitable for large scale, continuous jobs, and 
+* A streaming-first runtime that supports both batch processing and data 
streaming programs
+* A runtime that supports very high throughput and low event latency at the 
same time
+* Fault-tolerance with *exactly-once* processing guarantees
+* Natural back-pressure in streaming programs
+* Custom memory management for efficient and robust switching between 
in-memory and out-of-core data processing algorithms
+* Integration with YARN and other components of the Apache Hadoop ecosystem
+The [Beam Capability Matrix]({{ site.baseurl 
}}/documentation/runners/capability-matrix/) documents the supported 
capabilities of the Flink Runner.
+## Flink Runner prerequisites and setup
+If you want to use the local execution mode with the Flink runner to don't 
have to complete any setup.
+To use the Flink Runner for executing on a cluster, you have to setup a Flink 
cluster by following the Flink [setup 
+To find out which version of Flink you need you can run this command to check 
the version of the Flink dependency that your project is using:
+$ mvn dependency:tree -Pflink-runner |grep flink
+[INFO] |  +- org.apache.flink:flink-streaming-java_2.10:jar:1.1.2:runtime
+Here, we would need Flink 1.1.2.
+For more information, the [Flink 
can be helpful.
+### Specify your dependency
+You must specify your dependency on the Flink Runner.
+  <groupId>org.apache.beam</groupId>
+  <artifactId>beam-runners-flink_2.10</artifactId>
+  <version>{{ site.release_latest }}</version>
+  <scope>runtime</scope>
+## Executing a pipeline on a Flink cluster
+For executing a pipeline on a Flink cluster you need to package your program 
along will all dependencies in a so-called fat jar. How you do this depends on 
your build system but if you follow along the [Beam Quickstart]({{ site.baseurl 
}}/get-started/quickstart/) this is the command that you have to run:
+$ mvn package -Pflink-runner
+The Beam Quickstart Maven project is setup to use the Maven Shade plugin to 
create a fat jar and the `-Pflink-runner` argument makes sure to include the 
dependency on the Flink Runner.
+For actually running the pipeline you would use this command
+$ mvn exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
+    -Pflink-runner \
+    -Dexec.args="--runner=FlinkRunner \
+      --inputFile=/path/to/pom.xml \
+      --output=/path/to/counts \
+      --flinkMaster=<flink master url> \
+      --filesToStage=target/word-count-beam--bundled-0.1.jar"
+If you have a Flink `JobManager` running on your local machine you can give 
`localhost:6123` for
+## Pipeline options for the Flink Runner
+When executing your pipeline with the Flink Runner, you can set these pipeline 
+<table class="table table-bordered">
+  <th>Field</th>
+  <th>Description</th>
+  <th>Default Value</th>
+  <td><code>runner</code></td>
+  <td>The pipeline runner to use. This option allows you to determine the 
pipeline runner at runtime.</td>
+  <td>Set to <code>FlinkRunner</code> to run using Flink.</td>
+  <td><code>streaming</code></td>
+  <td>Whether streaming mode is enabled or disabled; <code>true</code> if 
enabled. Set to <code>true</code> if running pipelines with unbounded 
+  <td><code>false</code></td>
+  <td><code>flinkMaster</code></td>
+  <td>The url of the Flink JobManager on which to execute pipelines. This can 
either be the address of a cluster JobManager, in the form 
<code>"host:port"</code> or one of the special Strings <code>"[local]"</code> 
or <code>"[auto]"</code>. <code>"[local]"</code> will start a local Flink 
Cluster in the JVM while <code>"[auto]"</code> will let the system decide where 
to execute the pipeline based on the environment.</td>
+  <td><code>[auto]</code></td>
+  <td><code>filesToStage</code></td>
+  <td>Jar Files to send to all workers and put on the classpath. Here you have 
to put the fat jar that contains your program along with all dependencies.</td>
+  <td>empty</td>
+  <td><code>parallelism</code></td>
+  <td>The degree of parallelism to be used when distributing operations onto 
+  <td><code>1</code></td>
+  <td><code>checkpointingInterval</code></td>
+  <td>The interval between consecutive checkpoints (i.e. snapshots of the 
current pipeline state used for fault tolerance).</td>
+  <td><code>-1L</code>, i.e. disabled</td>
+  <td><code>numberOfExecutionRetries</code></td>
+  <td>Sets the number of times that failed tasks are re-executed. A value of 
<code>0</code> effectively disables fault tolerance. A value of <code>-1</code> 
indicates that the system default value (as defined in the configuration) 
should be used.</td>
+  <td><code>-1</code></td>
+  <td><code>executionRetryDelay</code></td>
+  <td>Sets the delay between executions. A value of <code>-1</code> indicates 
that the default value should be used.</td>
+  <td><code>-1</code></td>
+  <td><code>stateBackend</code></td>
+  <td>Sets the state backend to use in streaming mode. The default is to read 
this setting from the Flink config.</td>
+  <td><code>empty</code>, i.e. read from Flink config</td>
+See the reference documentation for the  <span 
class="language-java">[FlinkPipelineOptions]({{ site.baseurl 
}}/documentation/sdks/javadoc/{{ site.release_latest 
 interface (and its subinterfaces) for the complete list of pipeline 
configuration options.
+## Additional information and caveats
+### Monitoring your job
+You can monitor a running Flink job using the Flink JobManager Dashboard. By 
default, this is available at port `8081` of the JobManager node. If you have a 
Flink installation on your local machine that would be `http://localhost:8081`.
+### Streaming Execution
+If your pipeline uses an unbounded data source or sink, the Flink Runner will 
automatically switch to streaming mode. You can enforce streaming mode by using 
the `streaming` setting mentioned above.

Reply via email to