MartijnVisser commented on a change in pull request #17260:
URL: https://github.com/apache/flink/pull/17260#discussion_r707279337
##########
File path: docs/content/docs/dev/table/concepts/overview.md
##########
@@ -32,6 +32,79 @@ This means that Table API and SQL queries have the same
semantics regardless whe
The following pages explain concepts, practical limitations, and
stream-specific configuration parameters of Flink's relational APIs on
streaming data.
+State Management
+----------------
+
+Table programs that run in streaming mode leverage all capabilities of Flink
as a stateful stream
+processor.
+
+In particular, a table program can be configured with a [state backend]({{<
ref "docs/ops/state/state_backends" >}})
+and various [checkpointing options]({{< ref
"docs/dev/datastream/fault-tolerance/checkpointing" >}})
+for handling large amounts of state and fault tolerance. It is possible to
take a savepoint of a running
+Table API & SQL pipeline and to restore the application's state at later point
in time.
+
+### State Usage
+
+Due to the declarative nature of Table API & SQL program, it is not always
obvious where and how much
+state is used within a table pipeline. The planner decides about when state is
necessary to compute a correct
+result. A pipeline is optimized to claim as little state as possible given the
current set of optimizer
+rules.
+
+{{< hint info >}}
+Source tables are never kept entirely in state. This depends on the used
operations.
+{{< /hint >}}
+
+Simple `SELECT ... FROM ... WHERE` queries that only consist of field
projections or filters are usually
+stateless pipelines. However, operations such as joins, aggregations, or
deduplications require to keep
+intermediate results in a fault tolerant storage for which Flink's state
abstractions are used.
+
+{{< hint info >}}
+Please refer to the individual operator documentation for more details about
how much state is required
+and how to limit a potentially ever growing state size.
+{{< /hint >}}
+
+For example, a regular SQL join of two tables requires the operator to keep
both input tables in state
+entirely. For correct SQL semantics, the runtime needs to assume that a
matching could occur at any
+point in time from both sides. Flink provides [optimized window and interval
joins]({{< ref "docs/dev/table/sql/queries/joins" >}})
+that aim to keep the state size small by exploiting the concept of
[watermarks]({{< ref "docs/dev/table/concepts/time_attributes" >}}).
+
+### Stateful Upgrades and Evolution
+
+Table programs that are executed in streaming mode are intended as *standing
queries* that statically
+define an end-to-end pipline.
Review comment:
```suggestion
define an end-to-end pipeline.
```
##########
File path: docs/content/docs/dev/table/concepts/overview.md
##########
@@ -32,6 +32,79 @@ This means that Table API and SQL queries have the same
semantics regardless whe
The following pages explain concepts, practical limitations, and
stream-specific configuration parameters of Flink's relational APIs on
streaming data.
+State Management
+----------------
+
+Table programs that run in streaming mode leverage all capabilities of Flink
as a stateful stream
+processor.
+
+In particular, a table program can be configured with a [state backend]({{<
ref "docs/ops/state/state_backends" >}})
+and various [checkpointing options]({{< ref
"docs/dev/datastream/fault-tolerance/checkpointing" >}})
+for handling large amounts of state and fault tolerance. It is possible to
take a savepoint of a running
+Table API & SQL pipeline and to restore the application's state at later point
in time.
+
+### State Usage
+
+Due to the declarative nature of Table API & SQL program, it is not always
obvious where and how much
+state is used within a table pipeline. The planner decides about when state is
necessary to compute a correct
+result. A pipeline is optimized to claim as little state as possible given the
current set of optimizer
+rules.
+
+{{< hint info >}}
+Source tables are never kept entirely in state. This depends on the used
operations.
+{{< /hint >}}
+
+Simple `SELECT ... FROM ... WHERE` queries that only consist of field
projections or filters are usually
+stateless pipelines. However, operations such as joins, aggregations, or
deduplications require to keep
+intermediate results in a fault tolerant storage for which Flink's state
abstractions are used.
+
+{{< hint info >}}
+Please refer to the individual operator documentation for more details about
how much state is required
+and how to limit a potentially ever growing state size.
+{{< /hint >}}
+
+For example, a regular SQL join of two tables requires the operator to keep
both input tables in state
+entirely. For correct SQL semantics, the runtime needs to assume that a
matching could occur at any
+point in time from both sides. Flink provides [optimized window and interval
joins]({{< ref "docs/dev/table/sql/queries/joins" >}})
+that aim to keep the state size small by exploiting the concept of
[watermarks]({{< ref "docs/dev/table/concepts/time_attributes" >}}).
+
+### Stateful Upgrades and Evolution
+
+Table programs that are executed in streaming mode are intended as *standing
queries* that statically
+define an end-to-end pipline.
+
+In case of stateful pipelines, any change to both the query or Flink's planner
might lead to a completely
+different execution plan. This makes stateful upgrades and the evolution of
table programs challenging
+at the moment. The community is working on improving those shortcomings.
+
+For example, by adding a filter predicate, the optimizer might decide to
reorder joins or change the
+schema of an intermediate operator. This prevents restoring from a savepoint
due to either changed
+topology or different column layout within the state of an operator.
+
+The query implementer must ensure that the optimized plans before and after
the change are compatible.
+Use the `EXPLAIN` command in SQL or `table.explain()` in Table API to [get
insights]({{< ref "docs/dev/table/common" >}}#explaining-a-table).
+
+Since new optimizer rules are continously added, and operators become more
efficient and specialized,
Review comment:
```suggestion
Since new optimizer rules are continuously added, and operators become more
efficient and specialized,
```
##########
File path: docs/content/docs/dev/table/concepts/overview.md
##########
@@ -32,6 +32,79 @@ This means that Table API and SQL queries have the same
semantics regardless whe
The following pages explain concepts, practical limitations, and
stream-specific configuration parameters of Flink's relational APIs on
streaming data.
+State Management
+----------------
+
+Table programs that run in streaming mode leverage all capabilities of Flink
as a stateful stream
+processor.
+
+In particular, a table program can be configured with a [state backend]({{<
ref "docs/ops/state/state_backends" >}})
+and various [checkpointing options]({{< ref
"docs/dev/datastream/fault-tolerance/checkpointing" >}})
+for handling large amounts of state and fault tolerance. It is possible to
take a savepoint of a running
+Table API & SQL pipeline and to restore the application's state at later point
in time.
Review comment:
```suggestion
Table API & SQL pipeline and to restore the application's state at a later
point in time.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]