[GitHub] [flink] infoverload commented on a change in pull request #17260: [FLINK-21589][docs] Document table pipeline upgrades

GitBox Tue, 14 Sep 2021 04:52:04 -0700


infoverload commented on a change in pull request #17260:
URL: https://github.com/apache/flink/pull/17260#discussion_r708189420




##########
File path: docs/content/docs/dev/table/concepts/overview.md
##########
@@ -32,6 +32,82 @@ This means that Table API and SQL queries have the same 
semantics regardless whe
 
 The following pages explain concepts, practical limitations, and 
stream-specific configuration parameters of Flink's relational APIs on 
streaming data.
 
+State Management
+----------------
+
+Table programs that run in streaming mode leverage all capabilities of Flink 
as a stateful stream
+processor.
+
+In particular, a table program can be configured with a [state backend]({{< 
ref "docs/ops/state/state_backends" >}})
+and various [checkpointing options]({{< ref 
"docs/dev/datastream/fault-tolerance/checkpointing" >}})
+for handling different requirements regarding state size and fault tolerance. 
It is possible to take
+a savepoint of a running Table API & SQL pipeline and to restore the 
application's state at a later
+point in time.
+
+### State Usage
+
+Due to the declarative nature of Table API & SQL program, it is not always 
obvious where and how much
+state is used within a pipeline. The planner decides whether state is 
necessary to compute a correct
+result. A pipeline is optimized to claim as little state as possible given the 
current set of optimizer
+rules.
+
+{{< hint info >}}
+Conceptually, source tables are never kept entirely in state. An implementer 
deals with logical tables
+(i.e. [dynamic tables]({{< ref "docs/dev/table/concepts/dynamic_tables" >}})). 
Their state requirements
+depend on the used operations.
+{{< /hint >}}
+
+Queries such as `SELECT ... FROM ... WHERE` queries that only consist of field 
projections or filters are usually
+stateless pipelines. However, operations such as joins, aggregations, or 
deduplications require to keep
+intermediate results in a fault tolerant storage for which Flink's state 
abstractions are used.
+
+{{< hint info >}}
+Please refer to the individual operator documentation for more details about 
how much state is required
+and how to limit a potentially ever growing state size.
+{{< /hint >}}
+
+For example, a regular SQL join of two tables requires the operator to keep 
both input tables in state
+entirely. For correct SQL semantics, the runtime needs to assume that a 
matching could occur at any
+point in time from both sides. Flink provides [optimized window and interval 
joins]({{< ref "docs/dev/table/sql/queries/joins" >}})
+that aim to keep the state size small by exploiting the concept of 
[watermarks]({{< ref "docs/dev/table/concepts/time_attributes" >}}).
+
+### Stateful Upgrades and Evolution
+
+Table programs that are executed in streaming mode are intended as *standing 
queries* that statically
+define an end-to-end pipeline.

Review comment:
       ```suggestion
   Table programs that are executed in streaming mode are intended as *standing 
queries* (defined once and then executed continuously) that statically define 
an end-to-end pipeline.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] infoverload commented on a change in pull request #17260: [FLINK-21589][docs] Document table pipeline upgrades

Reply via email to