This is an automated email from the ASF dual-hosted git repository. git-site-role pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/asf-site by this push: new 67d9c52 Publishing website 2019/01/30 14:22:33 at commit 095664b 67d9c52 is described below commit 67d9c52a20c18406352d924935960688a1f6e9a2 Author: jenkins <bui...@apache.org> AuthorDate: Wed Jan 30 14:22:34 2019 +0000 Publishing website 2019/01/30 14:22:33 at commit 095664b --- .../documentation/runners/flink/index.html | 323 +++++++++++++++++++-- 1 file changed, 296 insertions(+), 27 deletions(-) diff --git a/website/generated-content/documentation/runners/flink/index.html b/website/generated-content/documentation/runners/flink/index.html index a32b5fd..0936d64 100644 --- a/website/generated-content/documentation/runners/flink/index.html +++ b/website/generated-content/documentation/runners/flink/index.html @@ -188,13 +188,13 @@ </ul> </li> <li><a href="#executing-a-pipeline-on-a-flink-cluster">Executing a pipeline on a Flink cluster</a></li> - <li><a href="#pipeline-options-for-the-flink-runner">Pipeline options for the Flink Runner</a></li> <li><a href="#additional-information-and-caveats">Additional information and caveats</a> <ul> <li><a href="#monitoring-your-job">Monitoring your job</a></li> <li><a href="#streaming-execution">Streaming Execution</a></li> </ul> </li> + <li><a href="#pipeline-options-for-the-flink-runner">Pipeline options for the Flink Runner</a></li> </ul> @@ -243,51 +243,99 @@ limitations under the License. <h2 id="flink-runner-prerequisites-and-setup">Flink Runner prerequisites and setup</h2> -<p>If you want to use the local execution mode with the Flink runner to don’t have to complete any setup.</p> +<p>If you want to use the local execution mode with the Flink Runner you don’t have to complete any setup. +You can simply run your Beam pipeline. Be sure to set the Runner to <code class="highlighter-rouge">FlinkRunner</code>.</p> -<p>To use the Flink Runner for executing on a cluster, you have to setup a Flink cluster by following the Flink <a href="https://ci.apache.org/projects/flink/flink-docs-stable/quickstart/setup_quickstart.html">setup quickstart</a>.</p> +<p>To use the Flink Runner for executing on a cluster, you have to setup a Flink cluster by following the +Flink <a href="https://ci.apache.org/projects/flink/flink-docs-stable/quickstart/setup_quickstart.html#setup-download-and-start-flink">Setup Quickstart</a>.</p> <h3 id="version-compatibility">Version Compatibility</h3> -<p>The Flink cluster version has to match the version used by the FlinkRunner. To find out which version of Flink please see the table below:</p> +<p>The Flink cluster version has to match the minor version used by the FlinkRunner. +The minor version is the first two numbers in the version string, e.g. in <code class="highlighter-rouge">1.7.0</code> the +minor version is <code class="highlighter-rouge">1.7</code>.</p> + +<p>We try to track the latest version of Apache Flink at the time of the Beam release. +A Flink version is supported by Beam for the time it is supported by the Flink community. +The Flink community typially supports the last two minor versions. When support for a Flink +version is dropped, it may be deprecated and removed also from Beam, with the exception of +Beam LTS releases. LTS releases continue to receive bug fixes for long as the LTS support +period.</p> + +<p>To find out which version of Flink is compatible with Beam please see the table below:</p> <table class="table table-bordered"> <tr> - <th>Beam version</th> - <th>Flink version</th> + <th>Beam Version</th> + <th>Flink Version</th> + <th>Artifact Id</th> </tr> <tr> - <td>2.8.0, 2.7.0, 2.6.0</td> + <td rowspan="3">2.10.0</td> <td>1.5.x</td> + <td>beam-runners-flink_2.11</td> +</tr> +<tr> + <td>1.6.x</td> + <td>beam-runners-flink-1.6</td> +</tr> +<tr> + <td>1.7.x</td> + <td>beam-runners-flink-1.7</td> +</tr> +<tr> + <td>2.9.0</td> + <td rowspan="4">1.5.x</td> + <td rowspan="4">beam-runners-flink_2.11</td> +</tr> +<tr> + <td>2.8.0</td> +</tr> +<tr> + <td>2.7.0</td> +</tr> +<tr> + <td>2.6.0</td> </tr> <tr> - <td>2.5.0, 2.4.0, 2.3.0</td> - <td>1.4.x</td> + <td>2.5.0</td> + <td rowspan="3">1.4.x with Scala 2.11</td> + <td rowspan="3">beam-runners-flink_2.11</td> +</tr> +<tr> + <td>2.4.0</td> +</tr> +<tr> + <td>2.3.0</td> </tr> <tr> <td>2.2.0</td> - <td>1.3.x with Scala 2.10</td> + <td rowspan="2">1.3.x with Scala 2.10</td> + <td rowspan="2">beam-runners-flink_2.10</td> </tr> <tr> - <td>2.2.0, 2.1.x</td> - <td>1.3.x with Scala 2.10</td> + <td>2.1.x</td> </tr> <tr> <td>2.0.0</td> <td>1.2.x with Scala 2.10</td> + <td>beam-runners-flink_2.10</td> </tr> </table> -<p>For retrieving the right version, see the <a href="https://flink.apache.org/downloads.html">Flink downloads page</a>.</p> +<p>For retrieving the right Flink version, see the <a href="https://flink.apache.org/downloads.html">Flink downloads page</a>.</p> <p>For more information, the <a href="https://ci.apache.org/projects/flink/flink-docs-stable/">Flink Documentation</a> can be helpful.</p> <h3 id="specify-your-dependency">Specify your dependency</h3> <p><span class="language-java">When using Java, you must specify your dependency on the Flink Runner in your <code class="highlighter-rouge">pom.xml</code>.</span></p> + +<p>Use the Beam version and the artifact id from the above table. For example:</p> + <div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="o"><</span><span class="n">dependency</span><span class="o">></span> <span class="o"><</span><span class="n">groupId</span><span class="o">></span><span class="n">org</span><span class="o">.</span><span class="na">apache</span><span class="o">.</span><span class="na">beam</span><span class="o"></</span><span class="n">groupId</span><span class="o">></span> - <span class="o"><</span><span class="n">artifactId</span><span class="o">></span><span class="n">beam</span><span class="o">-</span><span class="n">runners</span><span class="o">-</span><span class="n">flink_2</span><span class="o">.</span><span class="mi">11</span><span class="o"></</span><span class="n">artifactId</span><span class="o">></span> + <span class="o"><</span><span class="n">artifactId</span><span class="o">></span><span class="n">beam</span><span class="o">-</span><span class="n">runners</span><span class="o">-</span><span class="n">flink</span><span class="o">-</span><span class="mf">1.6</span><span class="o"></</span><span class="n">artifactId</span><span class="o">></span> <span class="o"><</span><span class="n">version</span><span class="o">></span><span class="mf">2.9</span><span class="o">.</span><span class="mi">0</span><span class="o"></</span><span class="n">version</span><span class="o">></span> <span class="o"></</span><span class="n">dependency</span><span class="o">></span> </code></pre> @@ -314,13 +362,31 @@ limitations under the License. --filesToStage=target/word-count-beam-bundled-0.1.jar" </code></pre> </div> -<p>If you have a Flink <code class="highlighter-rouge">JobManager</code> running on your local machine you can give <code class="highlighter-rouge">localhost:8081</code> for -<code class="highlighter-rouge">flinkMaster</code>.</p> +<p>If you have a Flink <code class="highlighter-rouge">JobManager</code> running on your local machine you can provide <code class="highlighter-rouge">localhost:8081</code> for +<code class="highlighter-rouge">flinkMaster</code>. Otherwise an embedded Flink cluster will be started for the WordCount job.</p> + +<h2 id="additional-information-and-caveats">Additional information and caveats</h2> + +<h3 id="monitoring-your-job">Monitoring your job</h3> + +<p>You can monitor a running Flink job using the Flink JobManager Dashboard or its Rest interfaces. By default, this is available at port <code class="highlighter-rouge">8081</code> of the JobManager node. If you have a Flink installation on your local machine that would be <code class="highlighter-rouge">http://localhost:8081</code>. Note: When you use the <code class="highlighter-rouge">[local]</code> mode an embedded Flink cluster will be started which does not make a dashboard availa [...] + +<h3 id="streaming-execution">Streaming Execution</h3> + +<p>If your pipeline uses an unbounded data source or sink, the Flink Runner will automatically switch to streaming mode. You can enforce streaming mode by using the <code class="highlighter-rouge">streaming</code> setting mentioned below.</p> <h2 id="pipeline-options-for-the-flink-runner">Pipeline options for the Flink Runner</h2> <p>When executing your pipeline with the Flink Runner, you can set these pipeline options.</p> +<p>See the reference documentation for the<span class="language-java"> +<a href="https://beam.apache.org/releases/javadoc/2.9.0/index.html?org/apache/beam/runners/flink/FlinkPipelineOptions.html">FlinkPipelineOptions</a> +</span><span class="language-py"> +<a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/options/pipeline_options.py">PipelineOptions</a> +</span>interface (and its subinterfaces) for the complete list of pipeline configuration options.</p> + +<!-- Java Options --> +<div class="language-java"> <table class="table table-bordered"> <tr> <th>Field</th> @@ -347,11 +413,18 @@ limitations under the License. <td>Jar Files to send to all workers and put on the classpath. Here you have to put the fat jar that contains your program along with all dependencies.</td> <td>empty</td> </tr> - <tr> <td><code>parallelism</code></td> <td>The degree of parallelism to be used when distributing operations onto workers.</td> - <td><code>1</code></td> + <td>For local execution: <code>Number of available CPU cores</code> + For remote execution: <code>Default parallelism configuerd at remote cluster</code> + Otherwise: <code>1</code> + </td> +</tr> +<tr> + <td><code>maxParallelism</code></td> + <td>The pipeline wide maximum degree of parallelism to be used. The maximum parallelism specifies the upper limit for dynamic scaling and the number of key groups used for partitioned state.</td> + <td><code>-1L</code>, meaning same as the parallelism</td> </tr> <tr> <td><code>checkpointingInterval</code></td> @@ -359,6 +432,30 @@ limitations under the License. <td><code>-1L</code>, i.e. disabled</td> </tr> <tr> + <td><code>checkpointMode</code></td> + <td>The checkpointing mode that defines consistency guarantee.</td> + <td><code>EXACTLY_ONCE</code></td> +</tr> +<tr> + <td><code>checkpointTimeoutMillis</code></td> + <td>The maximum time in milliseconds that a checkpoint may take before being discarded</td> + <td><code>-1</code>, the cluster default</td> +</tr> +<tr> + <td><code>minPauseBetweenCheckpoints</code></td> + <td>The minimal pause in milliseconds before the next checkpoint is triggered.</td> + <td><code>-1</code>, the cluster default</td> +</tr> +<tr> + <td><code>failOnCheckpointingErrors</code></td> + <td> + Sets the expected behaviour for tasks in case that they encounter an error in their + checkpointing procedure. If this is set to true, the task will fail on checkpointing error. + If this is set to false, the task will only decline a the checkpoint and continue running. + </td> + <td><code>-1</code>, the cluster default</td> +</tr> +<tr> <td><code>numberOfExecutionRetries</code></td> <td>Sets the number of times that failed tasks are re-executed. A value of <code>0</code> effectively disables fault tolerance. A value of <code>-1</code> indicates that the system default value (as defined in the configuration) should be used.</td> <td><code>-1</code></td> @@ -369,23 +466,195 @@ limitations under the License. <td><code>-1</code></td> </tr> <tr> + <td><code>objectReuse</code></td> + <td>Sets the behavior of reusing objects.</td> + <td><code>false</code>, no Object reuse</td> +</tr> +<tr> <td><code>stateBackend</code></td> <td>Sets the state backend to use in streaming mode. The default is to read this setting from the Flink config.</td> <td><code>empty</code>, i.e. read from Flink config</td> </tr> +<tr> + <td><code>enableMetrics</code></td> + <td>Enable/disable Beam metrics in Flink Runner</td> + <td>Default: <code>true</code></td> +</tr> +<tr> + <td><code>externalizedCheckpointsEnabled</code></td> + <td>Enables or disables externalized checkpoints. Works in conjunction with CheckpointingInterval</td> + <td>Default: <code>false</code></td> +</tr> +<tr> + <td><code>retainExternalizedCheckpointsOnCancellation</code></td> + <td>Sets the behavior of externalized checkpoints on cancellation.</td> + <td>Default: <code>false</code></td> +</tr> +<tr> + <td><code>maxBundleSize</code></td> + <td>The maximum number of elements in a bundle.</td> + <td>Default: <code>1000</code></td> +</tr> +<tr> + <td><code>maxBundleTimeMills</code></td> + <td>The maximum time to wait before finalising a bundle (in milliseconds).</td> + <td>Default: <code>1000</code></td> +</tr> +<tr> + <td><code>shutdownSourcesOnFinalWatermark</code></td> + <td>If set, shutdown sources when their watermark reaches +Inf.</td> + <td>Default: <code>false</code></td> +</tr> +<tr> + <td><code>latencyTrackingInterval</code></td> + <td>Interval in milliseconds for sending latency tracking marks from the sources to the sinks. Interval value <= 0 disables the feature.</td> + <td>Default: <code>0</code></td> +</tr> +<tr> + <td><code>autoWatermarkInterval</code></td> + <td>The interval in milliseconds for automatic watermark emission.</td> +</tr> +<tr> + <td><code>executionModeForBatch</code></td> + <td>Flink mode for data exchange of batch pipelines. Reference {@link org.apache.flink.api.common.ExecutionMode}. Set this to BATCH_FORCED if pipelines get blocked, see https://issues.apache.org/jira/browse/FLINK-10672</td> + <td>Default: <code>PIPELINED</code></td> +</tr> +<tr> + <td><code>savepointPath</code></td> + <td>Savepoint restore path. If specified, restores the streaming pipeline from the provided path.</td> + <td>Default: None</td> +</tr> +<tr> + <td><code>allowNonRestoredState</code></td> + <td>Flag indicating whether non restored state is allowed if the savepoint contains state for an operator that is no longer part of the pipeline.</td> + <td>Default: <code>false</code></td> +</tr> </table> +</div> -<p>See the reference documentation for the <span class="language-java"><a href="https://beam.apache.org/releases/javadoc/2.9.0/index.html?org/apache/beam/runners/flink/FlinkPipelineOptions.html">FlinkPipelineOptions</a></span><span class="language-py"><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/options/pipeline_options.py">PipelineOptions</a></span> interface (and its subinterfaces) for the complete list of pipeline configuration options.</p> - -<h2 id="additional-information-and-caveats">Additional information and caveats</h2> - -<h3 id="monitoring-your-job">Monitoring your job</h3> - -<p>You can monitor a running Flink job using the Flink JobManager Dashboard. By default, this is available at port <code class="highlighter-rouge">8081</code> of the JobManager node. If you have a Flink installation on your local machine that would be <code class="highlighter-rouge">http://localhost:8081</code>.</p> +<!-- Python Options --> +<div class="language-py"> +<table class="table table-bordered"> -<h3 id="streaming-execution">Streaming Execution</h3> +<tr> + <td><code>files_to_stage</code></td> + <td>Jar-Files to send to all workers and put on the classpath. The default value is all files from the classpath.</td> +</tr> +<tr> + <td><code>flink_master</code></td> + <td>Address of the Flink Master where the Pipeline should be executed. Can either be of the form "host:port" or one of the special values [local], [collection] or [auto].</td> + <td>Default: <code>[auto]</code></td> +</tr> +<tr> + <td><code>parallelism</code></td> + <td>The degree of parallelism to be used when distributing operations onto workers. If the parallelism is not set, the configured Flink default is used, or 1 if none can be found.</td> + <td>Default: <code>-1</code></td> +</tr> +<tr> + <td><code>max_parallelism</code></td> + <td>The pipeline wide maximum degree of parallelism to be used. The maximum parallelism specifies the upper limit for dynamic scaling and the number of key groups used for partitioned state.</td> + <td>Default: <code>-1</code></td> +</tr> +<tr> + <td><code>checkpointing_interval</code></td> + <td>The interval in milliseconds at which to trigger checkpoints of the running pipeline. Default: No checkpointing.</td> + <td>Default: <code>-1</code></td> +</tr> +<tr> + <td><code>checkpointing_mode</code></td> + <td>The checkpointing mode that defines consistency guarantee.</td> + <td>Default: <code>EXACTLY_ONCE</code></td> +</tr> +<tr> + <td><code>checkpoint_timeout_millis</code></td> + <td>The maximum time in milliseconds that a checkpoint may take before being discarded.</td> + <td>Default: <code>-1</code></td> +</tr> +<tr> + <td><code>min_pause_between_checkpoints</code></td> + <td>The minimal pause in milliseconds before the next checkpoint is triggered.</td> + <td>Default: <code>-1</code></td> +</tr> +<tr> + <td><code>fail_on_checkpointing_errors</code></td> + <td>Sets the expected behaviour for tasks in case that they encounter an error in their checkpointing procedure. If this is set to true, the task will fail on checkpointing error. If this is set to false, the task will only decline a the checkpoint and continue running. </td> + <td>Default: <code>true</code></td> +</tr> +<tr> + <td><code>number_of_execution_retries</code></td> + <td>Sets the number of times that failed tasks are re-executed. A value of zero effectively disables fault tolerance. A value of -1 indicates that the system default value (as defined in the configuration) should be used.</td> + <td>Default: <code>-1</code></td> +</tr> +<tr> + <td><code>execution_retry_delay</code></td> + <td>Sets the delay in milliseconds between executions. A value of {@code -1} indicates that the default value should be used.</td> + <td>Default: <code>-1</code></td> +</tr> +<tr> + <td><code>object_reuse</code></td> + <td>Sets the behavior of reusing objects.</td> + <td>Default: <code>false</code></td> +</tr> +<tr> + <td><code>state_backend</code></td> + <td>Sets the state backend to use in streaming mode. Otherwise the default is read from the Flink config.</td> +</tr> +<tr> + <td><code>enable_metrics</code></td> + <td>Enable/disable Beam metrics in Flink Runner</td> + <td>Default: <code>true</code></td> +</tr> +<tr> + <td><code>externalized_checkpoints_enabled</code></td> + <td>Enables or disables externalized checkpoints. Works in conjunction with CheckpointingInterval</td> + <td>Default: <code>false</code></td> +</tr> +<tr> + <td><code>retain_externalized_checkpoints_on_cancellation</code></td> + <td>Sets the behavior of externalized checkpoints on cancellation.</td> + <td>Default: <code>false</code></td> +</tr> +<tr> + <td><code>max_bundle_size</code></td> + <td>The maximum number of elements in a bundle.</td> + <td>Default: <code>1000</code></td> +</tr> +<tr> + <td><code>max_bundle_time_mills</code></td> + <td>The maximum time to wait before finalising a bundle (in milliseconds).</td> + <td>Default: <code>1000</code></td> +</tr> +<tr> + <td><code>shutdown_sources_on_final_watermark</code></td> + <td>If set, shutdown sources when their watermark reaches +Inf.</td> + <td>Default: <code>false</code></td> +</tr> +<tr> + <td><code>latency_tracking_interval</code></td> + <td>Interval in milliseconds for sending latency tracking marks from the sources to the sinks. Interval value <= 0 disables the feature.</td> + <td>Default: <code>0</code></td> +</tr> +<tr> + <td><code>auto_watermark_interval</code></td> + <td>The interval in milliseconds for automatic watermark emission.</td> +</tr> +<tr> + <td><code>execution_mode_for_batch</code></td> + <td>Flink mode for data exchange of batch pipelines. Reference {@link org.apache.flink.api.common.ExecutionMode}. Set this to BATCH_FORCED if pipelines get blocked, see https://issues.apache.org/jira/browse/FLINK-10672</td> + <td>Default: <code>PIPELINED</code></td> +</tr> +<tr> + <td><code>savepoint_path</code></td> + <td>Savepoint restore path. If specified, restores the streaming pipeline from the provided path.</td> +</tr> +<tr> + <td><code>allow_non_restored_state</code></td> + <td>Flag indicating whether non restored state is allowed if the savepoint contains state for an operator that is no longer part of the pipeline.</td> + <td>Default: <code>false</code></td> +</tr> -<p>If your pipeline uses an unbounded data source or sink, the Flink Runner will automatically switch to streaming mode. You can enforce streaming mode by using the <code class="highlighter-rouge">streaming</code> setting mentioned above.</p> +</table> +</div> </div> </div>