Jungtaek Lim created SPARK-53687:
------------------------------------

             Summary: Introduce WATERMARK clause in SQL statement
                 Key: SPARK-53687
                 URL: https://issues.apache.org/jira/browse/SPARK-53687
             Project: Spark
          Issue Type: Task
          Components: Declarative Pipelines, SQL, Structured Streaming
    Affects Versions: 4.1.0
            Reporter: Jungtaek Lim


With Spark Declarative Pipeline, Apache Spark supports to define the streaming 
query with SQL statement (Flow or Streaming Table).

The SQL statement is expected to be composed with streaming semantic (STREAM 
relation), and this converts the query part to be streaming. With this, the 
query can be stateful, e.g. aggregation will be a streaming aggregation, 
deduplication and stream-stream join, etc. And in majority of cases, we require 
watermark to be defined per source for the stateful query to work properly.

Databricks introduced WATERMARK clause to cover this functionality in Lakeflow 
Declarative Pipeline, but when open sourcing this, we missed to introduce this 
clause. This ticket is to open source WATERMARK clause.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to