Jungtaek Lim created SPARK-53687:
------------------------------------
Summary: Introduce WATERMARK clause in SQL statement
Key: SPARK-53687
URL: https://issues.apache.org/jira/browse/SPARK-53687
Project: Spark
Issue Type: Task
Components: Declarative Pipelines, SQL, Structured Streaming
Affects Versions: 4.1.0
Reporter: Jungtaek Lim
With Spark Declarative Pipeline, Apache Spark supports to define the streaming
query with SQL statement (Flow or Streaming Table).
The SQL statement is expected to be composed with streaming semantic (STREAM
relation), and this converts the query part to be streaming. With this, the
query can be stateful, e.g. aggregation will be a streaming aggregation,
deduplication and stream-stream join, etc. And in majority of cases, we require
watermark to be defined per source for the stateful query to work properly.
Databricks introduced WATERMARK clause to cover this functionality in Lakeflow
Declarative Pipeline, but when open sourcing this, we missed to introduce this
clause. This ticket is to open source WATERMARK clause.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]