[GitHub] [flink] Airblader commented on a change in pull request #16871: [FLINK-23832][docs] Update DataStream API Integration page

GitBox Wed, 18 Aug 2021 01:40:16 -0700


Airblader commented on a change in pull request #16871:
URL: https://github.com/apache/flink/pull/16871#discussion_r691015284




##########
File path: docs/content/docs/dev/table/data_stream_api.md
##########
@@ -574,6 +636,296 @@ env.execute()
 
 {{< top >}}
 
+Batch Runtime Mode
+------------------
+
+The *batch runtime mode* is a specialized execution mode for *bounded* Flink 
programs.
+
+Generally speaking, *boundedness* is a property of a data source that tells us 
whether all the records
+coming from that source are known before execution or whether new data will 
show up, potentially
+indefinitely. A job, in turn, is bounded if all its sources are bounded, and 
unbounded otherwise.
+
+*Streaming runtime mode*, on the other hand, can be used for both bounded and 
unbounded jobs.
+
+For more information on the different execution modes, see also the 
corresponding [DataStream API section]({{< ref 
"docs/dev/datastream/execution_mode" >}}).
+
+The Table API & SQL planner provides a set of specialized optimizer rules and 
runtime operators for either
+of the two modes.
+
+Currently, the runtime mode is not derived automatically from sources, thus, 
it must be set explicitly
+or will be adopted from `StreamExecutionEnvironment` when instantiating a 
`StreamTableEnvironment`:
+
+{{< tabs "f786b6ff-facc-4102-8833-3669f2fdef38" >}}
+{{< tab "Java" >}}
+```java
+import org.apache.flink.api.common.RuntimeExecutionMode;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.table.api.EnvironmentSettings;
+
+// adopt mode from StreamExecutionEnvironment
+StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
+env.setRuntimeMode(RuntimeExecutionMode.BATCH);
+StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);
+
+// or
+
+// set mode explicitly for StreamTableEnvironment
+// it will be propagated to StreamExecutionEnvironment during planning
+StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
+StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env, 
EnvironmentSettings.inBatchMode());
+```
+{{< /tab >}}
+{{< tab "Scala" >}}
+```scala
+import org.apache.flink.api.common.RuntimeExecutionMode
+import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
+import org.apache.flink.table.api.bridge.scala.StreamTableEnvironment
+import org.apache.flink.table.api.EnvironmentSettings
+
+// adopt mode from StreamExecutionEnvironment
+val env = StreamExecutionEnvironment.getExecutionEnvironment
+env.setRuntimeMode(RuntimeExecutionMode.BATCH)
+val tableEnv = StreamTableEnvironment.create(env)
+
+// or
+
+// set mode explicitly for StreamTableEnvironment
+val env = StreamExecutionEnvironment.getExecutionEnvironment
+val tableEnv = StreamTableEnvironment.create(env, 
EnvironmentSettings.inBatchMode)
+```
+{{< /tab >}}
+{{< /tabs >}}
+
+Setting the runtime mode has the following implications (among others):

Review comment:
       nit: "Setting _this_ runtime mode" or "Setting the runtime mode [to 
batch]"?

##########
File path: docs/content/docs/dev/table/data_stream_api.md
##########
@@ -574,6 +636,296 @@ env.execute()
 
 {{< top >}}
 
+Batch Runtime Mode
+------------------
+
+The *batch runtime mode* is a specialized execution mode for *bounded* Flink 
programs.
+
+Generally speaking, *boundedness* is a property of a data source that tells us 
whether all the records
+coming from that source are known before execution or whether new data will 
show up, potentially
+indefinitely. A job, in turn, is bounded if all its sources are bounded, and 
unbounded otherwise.
+
+*Streaming runtime mode*, on the other hand, can be used for both bounded and 
unbounded jobs.
+
+For more information on the different execution modes, see also the 
corresponding [DataStream API section]({{< ref 
"docs/dev/datastream/execution_mode" >}}).
+
+The Table API & SQL planner provides a set of specialized optimizer rules and 
runtime operators for either
+of the two modes.
+
+Currently, the runtime mode is not derived automatically from sources, thus, 
it must be set explicitly
+or will be adopted from `StreamExecutionEnvironment` when instantiating a 
`StreamTableEnvironment`:
+
+{{< tabs "f786b6ff-facc-4102-8833-3669f2fdef38" >}}
+{{< tab "Java" >}}
+```java
+import org.apache.flink.api.common.RuntimeExecutionMode;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
+import org.apache.flink.table.api.EnvironmentSettings;
+
+// adopt mode from StreamExecutionEnvironment
+StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
+env.setRuntimeMode(RuntimeExecutionMode.BATCH);
+StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);
+
+// or
+
+// set mode explicitly for StreamTableEnvironment
+// it will be propagated to StreamExecutionEnvironment during planning
+StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
+StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env, 
EnvironmentSettings.inBatchMode());
+```
+{{< /tab >}}
+{{< tab "Scala" >}}
+```scala
+import org.apache.flink.api.common.RuntimeExecutionMode
+import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
+import org.apache.flink.table.api.bridge.scala.StreamTableEnvironment
+import org.apache.flink.table.api.EnvironmentSettings
+
+// adopt mode from StreamExecutionEnvironment
+val env = StreamExecutionEnvironment.getExecutionEnvironment
+env.setRuntimeMode(RuntimeExecutionMode.BATCH)
+val tableEnv = StreamTableEnvironment.create(env)
+
+// or
+
+// set mode explicitly for StreamTableEnvironment
+val env = StreamExecutionEnvironment.getExecutionEnvironment
+val tableEnv = StreamTableEnvironment.create(env, 
EnvironmentSettings.inBatchMode)
+```
+{{< /tab >}}
+{{< /tabs >}}
+
+Setting the runtime mode has the following implications (among others):
+
+- All sources must declare themselves as bounded. Otherwise an exception is 
thrown.
+
+- Currently, table sources must emit insert-only changes. Otherwise an 
exception is thrown.
+
+- Progressive watermarks are neither generated nor used in operators. However, 
sources emit a maximum
+watermark before shutting down.
+
+- Exchanges between tasks might be blocking according to the 
[`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" 
>}}#execution-batch-shuffle-mode).
+This also means potentially less resource requirements compared to executing 
the same pipeline in streaming mode.
+
+- Checkpointing must be disabled. Artificial state backends are inserted.

Review comment:
       Some of these points (bounded sources, insert-only sources, disabled 
checkpointing) aren't really implications but rather prerequisites. I think we 
should list those separately in a "the following needs to be true" kind of 
sense, and then list the rest as implications.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] Airblader commented on a change in pull request #16871: [FLINK-23832][docs] Update DataStream API Integration page

Reply via email to