Re: Re: Re: [DISCUSS] FLIP-XXX: Support Watermark Definition in SQL Views

FeatZhang Wed, 20 May 2026 02:29:53 -0700

Hi Xuyang,

Thanks a lot for the confirmation, and for the very thorough back-and-forth
throughout this discussion — it really helped sharpen the proposal,
especially around the "arbitrary position" semantics, the planner-side
contract, and the catalog persistence story. Much appreciated!


@all: with Xuyang's last round of feedback addressed, the FLIP has
stabilized.

If anyone — Timo, Sergey, Yunfeng, Jingsong, Ron, Shengkai, and others
who have been following the thread — still has open concerns, please let
me know in the next ~3 days. Otherwise I'll start the VOTE thread.

Thanks again to everyone for the reviews!

Best,
FeatZhang

On Mon, May 18, 2026 at 5:20 PM Xuyang <[email protected]> wrote:

> Hi FeatZhang,
>
> Thanks for the update and the revised proposal! I don't have further
> questions. Looking forward to seeing this move forward.
>
>
>
> --
>
>     Best！
>     Xuyang
>
>
>
> 在 2026-05-16 13:44:20，"FeatZhang" <[email protected]> 写道：
> >Hi Xuyang,
> >
> >Thanks a lot for the careful follow-ups — these are exactly the right
> >questions to ask before we move to a vote. After re-checking the code
> >paths you pointed at, I want to revise several statements I made in my
> >previous reply, and tighten the FLIP accordingly. I'll go through them
> >one by one.
> >
> >1. Semantics at any position in the pipeline
> >=============================================
> >Fully agree. To make this concrete, I'll add a new sub-section
> >"Semantics at Different Pipeline Positions" to the FLIP, covering:
> >
> >  - APPLY_WATERMARK on a base table (with or without an existing
> >    DDL watermark)
> >  - APPLY_WATERMARK on top of a non-materialized view / sub-query
> >  - APPLY_WATERMARK applied multiple times in the same query
> >  - Interaction with TUMBLE / HOP / SESSION / CUMULATE
> >  - Interaction with joins (regular / interval / temporal)
> >
> >The mental model will be explicitly aligned with the DataStream API:
> >each APPLY_WATERMARK in SQL corresponds to one
> >`assignTimestampsAndWatermarks(...)` call in the DataStream pipeline,
> >applied in the order they appear.
> >
> >2. Monotonicity validation
> >==========================
> >You're right, and I want to correct my earlier reply. Today's
> >`CREATE TABLE ... WATERMARK FOR ... AS ...` does NOT enforce
> >monotonicity at the planner level — the planner only checks that:
> >
> >  - the rowtime column exists and is of TIMESTAMP / TIMESTAMP_LTZ,
> >  - the watermark expression is a valid scalar expression over the
> >    table's schema and resolves to a comparable type.
> >
> >Monotonicity is a runtime contract: the WatermarkAssignerOperator
> >emits watermarks that are guaranteed to be non-decreasing.
> >
> >APPLY_WATERMARK will follow exactly the same contract — no stricter
> >planner-level monotonicity check. I'll update the FLIP's "Planner
> >Changes → Validation" section to reflect this.
> >
> >3. Override timing — clarification
> >==================================
> >I think this was a wording issue on my side rather than a real
> >design disagreement. Let me restate it:
> >
> >  - APPLY_WATERMARK introduces a dedicated WatermarkAssigner node in
> >    the plan. Whether the input already carries a watermark or not,
> >    the plan ends up with the new assigner positioned downstream of
> >    the existing one.
> >  - At runtime there is no "merge" or "reconciliation": each
> >    WatermarkAssigner operator independently emits its own watermark
> >    stream; downstream operators simply observe the watermark from
> >    the most recent upstream assigner.
> >
> >This is the same model as calling `assignTimestampsAndWatermarks()`
> >twice in DataStream — the second call wins because it sits later in
> >the operator chain, not because of any planner-level magic.
> >
> >So "planner-level override" was a poor choice of words. The correct
> >description is: **the planner decides the operator topology; the
> >runtime emits watermarks according to that topology**, exactly like
> >DataStream. I'll rephrase the FLIP accordingly and drop the
> >"override" terminology.
> >
> >4. "Watermark expression evaluation: needs to support arbitrary
> >   expressions"
> >====================================================================
> >Apologies, this statement was inaccurate. After re-checking,
> >StreamExecWatermarkAssigner already evaluates the watermark
> >expression through the standard `ExprCodeGenerator`, which supports
> >the same scalar expressions as DDL today (arithmetic on TIMESTAMP /
> >TIMESTAMP_LTZ, INTERVAL arithmetic, scalar UDFs, etc.).
> >
> >What APPLY_WATERMARK actually needs from the ExecNode is:
> >
> >  - resolving the rowtime column index from the DESCRIPTOR, since the
> >    input may be a non-base-table (view / sub-query / projected
> >    relation),
> >  - wiring the watermark expression's input row to the upstream
> >    operator's output row instead of a TableScan output.
> >
> >No new expression capability is required. I'll fix this in the FLIP.
> >
> >5. State management
> >===================
> >You're right, this should be removed. In the scope of this FLIP the
> >APPLY_WATERMARK ExecNode is **stateless**, identical to the existing
> >`WatermarkAssignerOperator`. It does not buffer rows and does not
> >evict late data; late-data handling remains the responsibility of
> >the downstream window / join operators, exactly as it works today.
> >
> >The "state management" bullet in my previous reply was speculation
> >about future watermark strategies (idle source detection, etc.) and
> >does not belong in this FLIP. I'll drop it.
> >
> >------------------------------------------------------------
> >Updated summary of the design after this round
> >------------------------------------------------------------
> >
> >  - Scope: base tables, non-materialized views, sub-queries; any
> >    relation position in the query.
> >  - Semantics: aligned with DataStream API
> >    `assignTimestampsAndWatermarks()`; multiple applications in the
> >    same query are positional, not "overriding".
> >  - Validation: same contract as today's DDL — scalar expression on
> >    a TIMESTAMP / TIMESTAMP_LTZ rowtime column; no planner-level
> >    monotonicity check.
> >  - ExecNode: stateless, reuses StreamExecWatermarkAssigner with
> >    minor wiring changes (rowtime column resolution + non-TableScan
> >    input handling).
> >  - Out of scope: WatermarkFunction interface, idle-source state,
> >    runtime-level merge of multiple watermark strategies.
> >
> >I'll update the FLIP document and PR #27984 to match the points
> >above, and post a diff summary here once it's done.
> >
> >Thanks again for pushing on these — the FLIP is much cleaner after
> >this round.
> >
> >Best regards,
> >FeatZhang
> >
> >On Mon, May 11, 2026 at 10:45 AM Xuyang <[email protected]> wrote:
> >
> >> Hi FeatZhang. Thanks for the detailed responses. I have a few follow-up
> >> comments and questions:
> >>
> >>
> >> 1. Support for APPLY_WATERMARK at any node/position
> >> I generally agree with the direction that APPLY_WATERMARK should be
> >> applicable at any node and any position — one or more times — similar to
> >> how the DataStream API allows watermark assignment. However, I think we
> >> need to clearly articulate the precise behavior/semantics in each
> scenario
> >> to reduce user confusion. Aligning the mental model with the DataStream
> API
> >> (where users can call assignTimestampsAndWatermarks() at arbitrary
> points
> >> in the pipeline) would also help lower the learning curve.
> >> 2. Monotonicity validation of watermark expressions
> >> Why do we need to enforce monotonicity guarantees on the watermark
> >> expression at the planner level? As far as I know, Flink SQL currently
> does
> >> NOT perform such validation at the DDL level for CREATE TABLE ...
> WATERMARK
> >> FOR ... AS .... What it actually does is ensure at runtime that emitted
> >> watermarks are non-decreasing. If the existing DDL path does not
> validate
> >> monotonicity at planning time, why should APPLY_WATERMARK introduce a
> >> stricter contract?
> >> 3. Planner-level watermark override
> >> Could you elaborate more on why the watermark override must happen at
> the
> >> planner level? If I understand correctly, in the DataStream API, users
> can
> >> define different watermark strategies at different nodes in the same
> >> pipeline, and the runtime handles watermark propagation naturally.
> >> 4. "Watermark expression evaluation: Needs to support arbitrary
> >> expressions"
> >> You mentioned that StreamExecWatermarkAssigner currently has limitations
> >> in watermark expression evaluation and "needs to support arbitrary
> >> expressions." Could you clarify what the current limitations are
> exactly?
> >> Today's CREATE TABLE ... WATERMARK FOR ts AS ts - INTERVAL '5' SECOND
> >> already supports scalar expressions, what additional expression types
> does
> >> APPLY_WATERMARK require that are not already supported?
> >> 5. State management
> >> You mentioned "State management". Are we proposing to introduce a
> stateful
> >> watermark assigner node that evicts late data? This sounds like a
> >> significant change that goes well beyond the scope of this FLIP. The
> >> current WatermarkAssigner is stateless, it simply computes and emits
> >> watermarks without buffering data.
> >> Looking forward to your clarification!
> >>
> >>
> >>
> >>
> >>
> >> --
> >>
> >>     Best！
> >>     Xuyang
> >>
> >>
> >>
> >> At 2026-05-09 19:08:32, "熊饶饶" <[email protected]> wrote:
> >> >Hi Feat Zhang,
> >> >
> >> >+1 (non-binding)
> >> >
> >> >This is a well-motivated proposal that addresses a real pain point in
> >> Flink SQL. The inability to define watermarks in views currently forces
> >> users to either:
> >> >
> >> >Reference underlying table watermarks directly, breaking encapsulation
> >> >Create separate physical tables for each watermark strategy, leading to
> >> data duplication
> >> >The proposed syntax CREATE VIEW ... WATERMARK FOR col AS expr is
> >> intuitive and aligns naturally with existing DDL watermark semantics.
> The
> >> backward-compatible design (storing watermark metadata in catalog
> options)
> >> is also a smart choice — it avoids breaking existing views while
> enabling
> >> the new capability.
> >> >
> >> >The Data Lakehouse / medallion architecture use case is particularly
> >> compelling. Being able to define watermark strategies at the Silver/Gold
> >> layer while keeping Bronze as raw data would significantly simplify
> >> pipeline design for many teams.
> >> >
> >> >Looking forward to seeing this move to a formal FLIP!
> >> >
> >> >Best regards,
> >> >Raorao Xiong
> >> >
> >> >> 2026年5月7日 20:56，FeatZhang <[email protected]> 写道：
> >> >>
> >> >> Hi Xuyang,
> >> >>
> >> >> Thank you for the thorough review and thoughtful questions.
> >> >>
> >> >> *Problem this FLIP aims to solve*: The core goal of this FLIP is to
> >> allow
> >> >> watermarks to be defined on *computed columns (and, more generally,
> on
> >> any
> >> >> column produced inside a SQL query) directly in SQL statements*.
> Today,
> >> >> watermarks in Flink SQL can only be declared at the CREATE TABLE
> level
> >> >> via WATERMARK
> >> >> FOR ... AS ..., which means the time attribute must be a column
> visible
> >> at
> >> >> the base table DDL. This makes it impossible to attach a watermark
> to a
> >> >> timestamp derived inside a query — for example, one computed by
> string
> >> >> parsing, JSON extraction, or any expression inside a view or
> subquery —
> >> >> without pushing that computation back down into the source table
> DDL. By
> >> >> introducing APPLY_WATERMARK as an explicit relational operator that
> can
> >> be
> >> >> applied to *base tables, non-materialized views, and subqueries*,
> users
> >> can
> >> >> assign watermark semantics to any (computed) column produced by a
> query,
> >> >> which also addresses the broader motivations listed in the FLIP:
> broken
> >> >> layered-pipeline abstractions, lack of per-query / multi-tenant
> >> watermark
> >> >> strategies, and the current gap between SQL and the DataStream API.
> >> >>
> >> >> Now let me address each of your points:
> >> >> Support for Non-materialized Views
> >> >>
> >> >> You raised an excellent point about the scope of APPLY_WATERMARK in
> >> layered
> >> >> architectures.
> >> >>
> >> >> *My position*: APPLY_WATERMARK should support base tables,
> >> non-materialized
> >> >> views, *and subqueries* (this matches the Goals section of the FLIP).
> >> >> Here's why:
> >> >>
> >> >>   - Non-materialized views dissolve into the surrounding plan during
> >> >>   optimization (inline expansion)
> >> >>   - There's no physical "view" node in the execution plan—just a
> logical
> >> >>   alias
> >> >>   - The watermark becomes a relational transformation applied on top
> of
> >> >>   the view's / subquery's output
> >> >>
> >> >> The key design principle is: *watermark definition is an explicit
> >> >> relational operator, not attached metadata*.
> >> >>
> >> >> I'd also like to clarify the positions in the prior thread to avoid
> >> >> confusion:
> >> >>
> >> >>   - *Lincoln* originally proposed APPLY_WATERMARK(table,
> >> DESCRIPTOR(col),
> >> >>   expr) scoped to base tables only.
> >> >>   - *Timo* raised the concern about blurring the view abstraction
> ("*A
> >> >>   view usually dissolves into the plan … would a watermark definition
> >> >>   suddenly introduce an optimization barrier? If this is an
> optimization
> >> >>   barrier, is this still a view or a new concept?*"). This is exactly
> >> why
> >> >>   this FLIP does *not* attach watermarks to views via CREATE VIEW /
> >> ALTER
> >> >>   VIEW, and keeps views as pure logical aliases.
> >> >>   - *Gyula* emphasized that watermark assignment should be available
> on
> >> >>   views and subqueries too, consistent with the DataStream API.
> >> >>
> >> >> To address Timo's concern concretely:
> >> >>
> >> >>   - Watermark semantics are applied at query planning time via an
> >> explicit
> >> >>   relational operator (APPLY_WATERMARK), not hidden in view/catalog
> >> metadata.
> >> >>   - No watermark information is persisted into the catalog for views
> —
> >> the
> >> >>   catalog stays unchanged (see FLIP "Catalog Changes": *No catalog
> >> changes
> >> >>   are required*).
> >> >>   - Views continue to dissolve transparently into the plan; the
> >> >>   optimization barrier only appears where APPLY_WATERMARK is
> explicitly
> >> used.
> >> >>
> >> >> Monotonicity Validation
> >> >>
> >> >> Great question! Monotonicity guarantees are essential for watermark
> >> >> correctness:
> >> >>
> >> >>   - Watermarks define the boundary of "late" data
> >> >>   - If the watermark expression is not monotonically non-decreasing,
> the
> >> >>   watermark could move backward
> >> >>   - This would cause data that was previously considered "on-time"
> to be
> >> >>   treated as late (or vice-versa), breaking event-time semantics
> >> >>
> >> >> *Validation requirement*: In line with the FLIP's Planner Changes
> >> section,
> >> >> the planner validates that watermark_expression is a valid *scalar*
> >> expression
> >> >> over columns of the input schema, and (as with today's CREATE TABLE
> ...
> >> >> WATERMARK FOR ... AS ...) the expression must produce a monotonically
> >> >> non-decreasing value relative to the designated rowtime column.
> >> >>
> >> >> Typical valid forms are the same as what's allowed in DDL today, for
> >> >> example:
> >> >>
> >> >> -- Bounded out-of-orderness (most common)
> >> >> APPLY_WATERMARK(t, DESCRIPTOR(ts), ts - INTERVAL '5' SECOND)
> >> >> -- Strictly ascending
> >> >> APPLY_WATERMARK(t, DESCRIPTOR(ts), ts)
> >> >>
> >> >> Note: watermark_expression is a *scalar expression* per the FLIP
> (not an
> >> >> aggregate / window function). Richer forms such as user-defined
> >> watermark
> >> >> strategies are explicitly out of scope for this FLIP and are tracked
> as
> >> a
> >> >> future WatermarkFunction interface, which also depends on the Calcite
> >> >> lambda upgrade mentioned by Timo.
> >> >> Override Timing (Planner vs Runtime)
> >> >>
> >> >> You raised a valid concern. Let me clarify the design, which aligns
> with
> >> >> the FLIP's "Planner Changes → Interaction with Existing Table
> >> Watermarks"
> >> >> section:
> >> >>
> >> >> *Current proposal*: Planner-level override
> >> >>
> >> >>   - During query compilation, when the input to APPLY_WATERMARK
> already
> >> >>   carries a watermark (e.g., from CREATE TABLE ... WATERMARK), the
> >> >>   LogicalWatermarkAssigner node produced by APPLY_WATERMARK
> *overrides*
> >> the
> >> >>   upstream watermark strategy.
> >> >>   - When the input has no watermark (e.g., a view or a subquery),
> >> >>   APPLY_WATERMARK introduces a new one.
> >> >>   - This makes override behavior explicit in the plan and keeps room
> for
> >> >>   standard optimizations.
> >> >>
> >> >> *Why not runtime-level override*:
> >> >>
> >> >>   1. Planner-level override keeps watermark semantics a first-class,
> >> >>   visible part of the plan (consistent with how VECTOR_SEARCH /
> >> ML_PREDICT
> >> >>   are modeled as specialized ExecNodes in the FLIP).
> >> >>   2. The override point is deterministic and inspectable via EXPLAIN.
> >> >>   3. Simpler execution model — no dual-watermark reconciliation at
> >> runtime.
> >> >>
> >> >> If concrete use cases for a runtime-level override surface later, we
> >> could
> >> >> revisit this via a hint (e.g., /*+ RUNTIME_OVERRIDE */), but it's not
> >> part
> >> >> of this FLIP.
> >> >> Relationship with BuiltInProcessTableFunction
> >> >>
> >> >> Good observation. APPLY_WATERMARK is declared as a built-in PTF at
> the
> >> SQL
> >> >> surface, similar in spirit to TO_CHANGELOG / FROM_CHANGELOG, but the
> >> FLIP
> >> >> intentionally maps it to a *specialized ExecNode* rather than a
> generic
> >> PTF
> >> >> runtime — the same pattern used by VECTOR_SEARCH and ML_PREDICT.
> >> >>
> >> >> *Option A: Reuse the generic BuiltInProcessTableFunction runtime*
> >> >>
> >> >>   - Pros: Consistent with other built-in PTFs at the runtime layer.
> >> >>   - Cons: Watermark assignment is not a row-transforming PTF — it
> >> changes
> >> >>   stream metadata (time attribute + watermark strategy). Forcing it
> >> through
> >> >>   the generic PTF runtime would require extending the PTF contract
> with
> >> >>   watermark semantics.
> >> >>
> >> >> *Option B: Dedicated LogicalWatermarkAssigner + specialized ExecNode
> >> (the
> >> >> FLIP's choice)*
> >> >>
> >> >>   - Pros: Keeps watermark semantics a first-class citizen in the
> >> planner;
> >> >>   cleanly integrates with watermark propagation rules; no need to
> >> overload
> >> >>   the PTF contract; same pattern as VECTOR_SEARCH / ML_PREDICT
> already
> >> >>   established in Flink.
> >> >>   - Cons: A new dedicated node, though that cost is small compared to
> >> the
> >> >>   semantic clarity.
> >> >>
> >> >> *Current decision*: Option B, as stated in the FLIP
> ("*APPLY_WATERMARK
> >> >> compiles to a specialized ExecNode --- similar to how VECTOR_SEARCH
> and
> >> >> ML_PREDICT are handled*"). Open to revisiting based on community
> >> feedback.
> >> >> StreamExecWatermarkAssigner Sufficiency
> >> >>
> >> >> For the physical implementation:
> >> >>
> >> >> *Yes, StreamExecWatermarkAssigner should be sufficient*, with some
> >> >> modifications:
> >> >>
> >> >>   1. *Input handling*: Currently assumes direct table scan; needs to
> >> >>   handle APPLY_WATERMARK's column mapping
> >> >>   2. *Watermark expression evaluation*: Needs to support arbitrary
> >> >>   expressions (currently limited)
> >> >>   3. *State management*: May need additional state for handling
> >> >>   out-of-order events
> >> >>
> >> >> The key insight is that APPLY_WATERMARK conceptually translates to:
> >> >>
> >> >> TableScan -> Calc (expression evaluation) -> WatermarkAssigner
> >> >>
> >> >> StreamExecWatermarkAssigner handles the last step; the Calc step
> handles
> >> >> the expression.
> >> >> ------------------------------
> >> >> Summary of Proposed Responses
> >> >>
> >> >>   - *Scope of input*: Support base tables, non-materialized views
> *and
> >> >>   subqueries* (per FLIP Goals); views/catalog semantics stay
> unchanged.
> >> >>   - *Monotonicity*: Validate watermark_expression as a scalar
> >> expression;
> >> >>   same monotonicity contract as today's DDL watermarks.
> >> >>   - *Override timing*: Planner-level override at
> >> LogicalWatermarkAssigner;
> >> >>   potential /*+ RUNTIME_OVERRIDE */ hint as future work.
> >> >>   - *PTF reuse*: Dedicated LogicalWatermarkAssigner + specialized
> >> ExecNode
> >> >>   (same pattern as VECTOR_SEARCH / ML_PREDICT).
> >> >>   - *ExecNode sufficiency*: StreamExecWatermarkAssigner is sufficient
> >> with
> >> >>   minor modifications (input handling + expression evaluation).
> >> >>
> >> >> ------------------------------
> >> >>
> >> >> Looking forward to further discussion!
> >> >>
> >> >> Best regards,
> >> >> FeatZhang
> >> >>
> >> >>
> >> >> On Wed, May 6, 2026 at 4:39 PM Xuyang <[email protected]> wrote:
> >> >>
> >> >>> Hi, FeatZhang. Thanks for driving this discussion. I've read through
> >> the
> >> >>> full FLIP and the mailing list context, and I have a few questions:
> >> >>> 1. If I understand correctly, in a Layered Data Architecture,
> >> >>> silver_events would typically be a table, a materialized view, or a
> >> >>> materialized table. From the mailing list discussion, it seems like
> no
> >> >>> consensus was reached on this point. I think we still need to
> consider
> >> >>> whether APPLY_WATERMARK should be allowed on (non-materialized)
> views.
> >> >>> 2. In the Planner Changes section under Logical Plan, could you
> >> elaborate
> >> >>> on why monotonicity guarantees need to be ensured for the watermark
> >> >>> expression validation?
> >> >>> 3. (nit) In the Watermark Override part under Planner Changes,
> >> shouldn't
> >> >>> the override of the upstream watermark happen at runtime rather than
> >> at the
> >> >>> planner level?
> >> >>> 4. I feel that APPLY_WATERMARK is quite similar to TO_CHANGELOG and
> >> >>> FROM_CHANGELOG. Is what we actually need just a
> >> >>> BuiltInProcessTableFunction? That way, we would only need to further
> >> extend
> >> >>> ProcessTableFunction to support this.
> >> >>> 5. If we choose to translate APPLY_WATERMARK into a specialized
> >> ExecNode
> >> >>> (similar to VECTOR_SEARCH and ML_PREDICT), would the existing
> >> >>> StreamExecWatermarkAssigner be sufficient for this purpose?
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>>
> >> >>>    Best！
> >> >>>    Xuyang
> >> >>>
> >> >>>
> >> >>>
> >> >>> At 2026-04-21 22:22:33, "FeatZhang" <[email protected]> wrote:
> >> >>>> Hi everyone,
> >> >>>>
> >> >>>> Thank you for the feedback and discussions on the initial proposal.
> >> I've
> >> >>>> revised the FLIP based on the community's input and would like to
> >> share
> >> >>>> the updated version.
> >> >>>>
> >> >>>>
> >> >>>> FLIP-XXX: Support Flexible Watermark Assignment via Built-in
> Function
> >> >>>> <
> >> >>>
> >>
> https://drive.google.com/open?id=17PXYAi6Pb91OqFhVVK7tRULiaHAb6wiX79jjC0NaaDA
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> KEY UPDATES
> >> >>>> ===========
> >> >>>>
> >> >>>> The proposal has evolved from "Support Watermark Definition in SQL
> >> Views"
> >> >>>> to a more flexible and powerful approach: FLIP-XXX: Support
> Flexible
> >> >>>> Watermark Assignment via APPLY_WATERMARK Function.
> >> >>>>
> >> >>>> What's Changed:
> >> >>>>
> >> >>>> 1. Broader Scope: Instead of limiting watermark definitions to SQL
> >> views
> >> >>>>  only, the new proposal introduces a built-in table function
> >> >>>>  APPLY_WATERMARK that works with:
> >> >>>>  - Base tables
> >> >>>>  - Views (both regular and materialized)
> >> >>>>  - Subqueries
> >> >>>>  - Any table-valued expressions
> >> >>>>
> >> >>>> 2. More Flexible Design: The function-based approach provides:
> >> >>>>  - Dynamic watermark assignment at query time without modifying
> >> catalog
> >> >>>>    metadata
> >> >>>>  - Override capability for existing watermark strategies
> >> >>>>  - Composability with other SQL operations
> >> >>>>  - No need for DDL changes or catalog write permissions
> >> >>>>
> >> >>>> 3. Better SQL Semantics: Using a table function follows SQL
> standard
> >> >>>>  patterns and integrates naturally with Flink's existing function
> >> >>>>  ecosystem.
> >> >>>>
> >> >>>> UPDATED FLIP DOCUMENT
> >> >>>> =====================
> >> >>>>
> >> >>>> The revised FLIP is now available at:
> >> >>>> https://iwiki.woa.com/p/4019879693
> >> >>>>
> >> >>>> Key sections include:
> >> >>>> - Motivation and use cases
> >> >>>> - Public interfaces and SQL syntax
> >> >>>> - Implementation plan
> >> >>>> - Compatibility analysis
> >> >>>> - Test plan
> >> >>>>
> >> >>>> EXAMPLE USAGE
> >> >>>> =============
> >> >>>>
> >> >>>> -- Apply watermark to a view
> >> >>>> SELECT *FROM APPLY_WATERMARK(my_view, DESCRIPTOR(event_time),
> >> >>>> event_time - INTERVAL '5' SECOND);
> >> >>>> -- Override existing watermark strategy
> >> >>>> SELECT *FROM APPLY_WATERMARK(my_table_with_watermark,
> DESCRIPTOR(ts),
> >> >>>> ts - INTERVAL '10' SECOND -- Different from DDL watermark
> >> >>>> );
> >> >>>> -- Use in complex queries
> >> >>>> SELECT window_start,
> >> >>>>      window_end,
> >> >>>>      COUNT(*)FROM TABLE(TUMBLE(TABLE APPLY_WATERMARK(orders,
> >> >>>> DESCRIPTOR(order_time), order_time - INTERVAL '5' SECOND),
> >> >>>> DESCRIPTOR(order_time), INTERVAL '1' HOUR))GROUP BY window_start,
> >> >>>>        window_end;
> >> >>>>
> >> >>>>
> >> >>>> IMPLEMENTATION PROGRESS
> >> >>>> =======================
> >> >>>>
> >> >>>> I've also opened a draft PR #27984 with the initial implementation:
> >> >>>> - Core built-in function definition
> >> >>>> - SQL-to-RelNode conversion rules
> >> >>>> - Physical plan integration
> >> >>>> - Unit tests and documentation (English + Chinese)
> >> >>>>
> >> >>>> The PR is available at:
> >> >>>> https://github.com/apache/flink/pull/27984
> >> >>>>
> >> >>>> REQUEST FOR FEEDBACK
> >> >>>> ====================
> >> >>>>
> >> >>>> I would appreciate your thoughts on:
> >> >>>>
> >> >>>> 1. Function naming: Is APPLY_WATERMARK clear and intuitive?
> >> >>>>  (Alternative considered: WITH_WATERMARK, SET_WATERMARK)
> >> >>>>
> >> >>>> 2. DESCRIPTOR syntax: Using DESCRIPTOR(column_name) to specify the
> >> >>>>  rowtime column—does this align well with Flink's existing
> patterns?
> >> >>>>
> >> >>>> 3. Override behavior: Should APPLY_WATERMARK always override
> existing
> >> >>>>  watermarks, or should we provide a mode parameter
> >> >>>>  (e.g., OVERRIDE, MERGE)?
> >> >>>>
> >> >>>> 4. Performance considerations: Any concerns about the
> function-based
> >> >>>>  approach vs. catalog-level watermark definitions?
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> Looking forward to your valuable feedback!
> >> >>>>
> >> >>>> Best regards,
> >> >>>> FeatZhang
> >> >>>>
> >> >>>> On Thu, Feb 12, 2026 at 6:24 PM Timo Walther <[email protected]>
> >> wrote:
> >> >>>>
> >> >>>>> Hi everyone,
> >> >>>>>
> >> >>>>> I think we all agree that we clearly want this functionality, just
> >> the
> >> >>>>> "how" needs to be discussed. I also like Lincoln’s suggestion of
> >> >>>>> introducing a built-in PTF for this, I had similar ideas in mind.
> >> >>>>>
> >> >>>>> There are two issues with a APPLY_WATERMARK function, but both on
> the
> >> >>>>> short-term roadmap:
> >> >>>>>
> >> >>>>> 1) This function would need to be a function that takes an
> >> expression.
> >> >>>>> Ideally as a lambda function. Newer Calcite versions have already
> >> lambda
> >> >>>>> expression support. At Confluent we were planning to work on a
> >> Calcite
> >> >>>>> upgrade this quarter especially to get lambda support in and
> improve
> >> >>>>> built-in functions that work on collections.
> >> >>>>>
> >> >>>>> 2) User-defined PTFs are currently not able to emit watermarks. We
> >> could
> >> >>>>> introduce a new interface WatermarkFunction (similar to
> >> >>>>> ChangelogFunction) that would offer this to everyone.
> Alternatively,
> >> we
> >> >>>>> could only use the PTF signature, but translate to a specialized
> >> >>>>> ExecNode similar how we do it for VECTOR_SEARCH and ML_PREDICT.
> >> >>>>>
> >> >>>>> In any case, even if we go with the function approach, we
> definitely
> >> >>>>> need a full FLIP on this.
> >> >>>>>
> >> >>>>> Thanks,
> >> >>>>> Timo
> >> >>>>>
> >> >>>>> On 12.02.26 08:25, Gyula Fóra wrote:
> >> >>>>>> Hi All!
> >> >>>>>> I would like to chime in here quickly from a slightly different
> >> angle.
> >> >>>>>> While I am the first to admit that I cannot grasp all the
> planning /
> >> >>>>>> conceptual implications, I also feel the need for more flexible
> >> >>> watermark
> >> >>>>>> handling as suggested by Feat.
> >> >>>>>>
> >> >>>>>> Anything that can only be applied to base/catalog tables is very
> >> >>> limiting
> >> >>>>>> from a usability perspective. Watermarks feel like they should
> be a
> >> >>>>> simple
> >> >>>>>> function that you can apply on a column/table as part of a
> >> query/view.
> >> >>>>> For
> >> >>>>>> example extract timestamp from a string convert to TS -> apply
> >> >>> watermark
> >> >>>>>> etc.
> >> >>>>>>
> >> >>>>>> Users often receive the tables/catalogs as given and can only
> write
> >> >>>>>> queries.
> >> >>>>>>
> >> >>>>>> Fixing this would eliminate a long standing disconnect between
> the
> >> >>>>>> datastream api flexible watermark handling compared to the
> currently
> >> >>> very
> >> >>>>>> restrictive SQL approach.
> >> >>>>>>
> >> >>>>>> Cheers
> >> >>>>>> Gyula
> >> >>>>>>
> >> >>>>>> On Thu, Feb 12, 2026 at 7:54 AM FeatZhang <[email protected]>
> >> >>> wrote:
> >> >>>>>>
> >> >>>>>>> Hi Timo, Lincoln,
> >> >>>>>>>
> >> >>>>>>> Thank you both for the detailed feedback.
> >> >>>>>>>
> >> >>>>>>> I agree with the concern that non-materialized SQL views should
> >> >>> remain a
> >> >>>>>>> pure logical abstraction. Introducing watermark definitions
> >> directly
> >> >>>>>>> into CREATE
> >> >>>>>>> VIEW or ALTER VIEW could blur the boundary between logical
> aliasing
> >> >>> and
> >> >>>>>>> physical planning semantics, especially considering optimization
> >> >>>>> barriers
> >> >>>>>>> and watermark propagation behavior.
> >> >>>>>>>
> >> >>>>>>> Lincoln’s suggestion of introducing a built-in function such as:
> >> >>>>>>>
> >> >>>>>>> APPLY_WATERMARK(table, DESCRIPTOR(rowtime_column),
> >> >>> watermark_expression)
> >> >>>>>>>
> >> >>>>>>> is a cleaner direction. It keeps watermark definition as an
> >> explicit
> >> >>>>>>> relational transformation rather than attaching additional
> >> semantics
> >> >>> to
> >> >>>>>>> views.
> >> >>>>>>>
> >> >>>>>>> However, to fully address the original use cases (especially
> >> logical
> >> >>>>> reuse
> >> >>>>>>> and layered lakehouse architectures), I propose that the table
> >> >>> parameter
> >> >>>>>>> should support:
> >> >>>>>>>
> >> >>>>>>>    - Base tables
> >> >>>>>>>    - Non-materialized views
> >> >>>>>>>
> >> >>>>>>> If APPLY_WATERMARK can accept both, we can:
> >> >>>>>>>
> >> >>>>>>>    - Preserve the conceptual purity of SQL views
> >> >>>>>>>    - Avoid redefining view semantics
> >> >>>>>>>    - Still enable logical reuse via views
> >> >>>>>>>    - Allow different watermark strategies over the same logical
> >> >>>>> relation
> >> >>>>>>>
> >> >>>>>>> In other words, watermark definition becomes an explicit
> relational
> >> >>>>>>> operator applied on top of any logical relation, instead of
> being
> >> >>>>> embedded
> >> >>>>>>> into the view definition itself.
> >> >>>>>>>
> >> >>>>>>> From a planner perspective, this keeps the model consistent:
> >> >>>>>>>
> >> >>>>>>>    - The function expands into a relational node
> >> >>>>>>>    - No optimization barrier is introduced by views
> >> >>>>>>>    - Watermark handling remains part of the logical plan
> >> >>> transformation
> >> >>>>>>>
> >> >>>>>>> I will prepare a PR to prototype APPLY_WATERMARK with support
> for
> >> >>> both
> >> >>>>> base
> >> >>>>>>> tables and non-materialized views, and share it for further
> >> >>> discussion.
> >> >>>>>>>
> >> >>>>>>> Looking forward to your thoughts.
> >> >>>>>>>
> >> >>>>>>> Best,
> >> >>>>>>> Feat
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> Lincoln Lee <[email protected]> 于2026年2月12日周四 12:19写道：
> >> >>>>>>>
> >> >>>>>>>> Agree with Timo’s point regarding the conceptual semantics. We
> >> >>> should
> >> >>>>> not
> >> >>>>>>>> directly extend non-materialized views with additional
> watermark
> >> >>>>>>>> definitions.
> >> >>>>>>>>
> >> >>>>>>>> Regarding the use case mentioned by Feat, defining different
> >> >>> watermark
> >> >>>>>>>> strategies
> >> >>>>>>>> for the same data source, especially in the case of catalog
> >> tables,
> >> >>> we
> >> >>>>>>> are
> >> >>>>>>>> exploring a possible solution introducing a built-in function:
> >> >>>>>>>> ```sql
> >> >>>>>>>> APPLY_WATERMARK(table, DESCRIPTOR(rowtime_column),
> >> >>>>> watermark_expression)
> >> >>>>>>>> ```
> >> >>>>>>>> This function only support base tables as input and not support
> >> >>> views,
> >> >>>>>>>> subqueries
> >> >>>>>>>> or derived relations.
> >> >>>>>>>>
> >> >>>>>>>> This would address a meaningful subset of the identified use
> cases
> >> >>>>>>> without
> >> >>>>>>>> redefining the role of SQL views.
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> Best,
> >> >>>>>>>> Lincoln Lee
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> Timo Walther <[email protected]> 于2026年2月11日周三 23:29写道：
> >> >>>>>>>>
> >> >>>>>>>>> Hi Feat,
> >> >>>>>>>>>
> >> >>>>>>>>> thanks for proposing this FLIP. We had similar discussions in
> the
> >> >>>>> past,
> >> >>>>>>>>> but so far could never reach consensus.
> >> >>>>>>>>>
> >> >>>>>>>>> SQL views are actually a very simple concept, they just give
> SQL
> >> >>> text
> >> >>>>>>> an
> >> >>>>>>>>> alias. A view has no other properties except for the view
> >> >>> definition.
> >> >>>>>>>>> Everything else is dynamically computed when the SQL text is
> >> >>> inserted
> >> >>>>>>>>> into the larger plan. A view is never evaluated without the
> >> >>>>> surrounding
> >> >>>>>>>>> plan.
> >> >>>>>>>>>
> >> >>>>>>>>> Watermarks in the middle of a pipeline raise a couple of
> tricky
> >> >>>>> issues:
> >> >>>>>>>>>
> >> >>>>>>>>> - What if the upstream table is updating, how would you deal
> with
> >> >>>>>>>>> watermarks in the downstream view?
> >> >>>>>>>>> - What if the upstream table emits already watermarks? Would
> the
> >> >>> view
> >> >>>>>>>>> catch them and discard this information?
> >> >>>>>>>>> - A view usually dissolves into the plan (e.g. via projection
> or
> >> >>>>> filter
> >> >>>>>>>>> pushdown). Would a watermark definition suddenly introduce an
> >> >>>>>>>>> optimization barrier? If this is an optimization barrier, is
> this
> >> >>>>> still
> >> >>>>>>>>> a view or a new concept? E.g. a "materialized view" or
> >> "pre-planned
> >> >>>>>>>> view"?
> >> >>>>>>>>>
> >> >>>>>>>>> Cheers,
> >> >>>>>>>>> Timo
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> On 11.02.26 03:51, FeatZhang wrote:
> >> >>>>>>>>>> Hi Flink Community,
> >> >>>>>>>>>>
> >> >>>>>>>>>> I'd like to propose adding watermark support for SQL Views in
> >> >>> Flink
> >> >>>>>>> to
> >> >>>>>>>>>> better support event-time processing scenarios.
> >> >>>>>>>>>> Problem Statement
> >> >>>>>>>>>>
> >> >>>>>>>>>> Currently, Flink SQL views cannot define watermarks. This
> >> >>> limitation
> >> >>>>>>>>>> creates several challenges:
> >> >>>>>>>>>>
> >> >>>>>>>>>>     - *Broken Abstraction*: Users must reference underlying
> >> table
> >> >>>>>>>>> watermarks
> >> >>>>>>>>>>     directly, exposing implementation details
> >> >>>>>>>>>>     - *No Flexibility*: Cannot define different watermark
> >> >>> strategies
> >> >>>>>>>> for
> >> >>>>>>>>>>     different use cases on the same data source
> >> >>>>>>>>>>     - *Limited Architecture Support*: Incompatible with
> modern
> >> >>>>>>> layered
> >> >>>>>>>>> data
> >> >>>>>>>>>>     architectures (Bronze/Silver/Gold medallion pattern)
> >> >>>>>>>>>>
> >> >>>>>>>>>> For example, in multi-tenant scenarios, different tenants may
> >> >>> require
> >> >>>>>>>>>> different lateness tolerance, but currently we cannot create
> >> views
> >> >>>>>>> with
> >> >>>>>>>>>> different watermark strategies on the same source table.
> >> >>>>>>>>>> Proposed Solution
> >> >>>>>>>>>>
> >> >>>>>>>>>> I propose adding two SQL syntax options to support watermark
> >> >>>>>>>> definitions
> >> >>>>>>>>> in
> >> >>>>>>>>>> views:
> >> >>>>>>>>>>
> >> >>>>>>>>>> *Option 1: CREATE VIEW with WATERMARK*
> >> >>>>>>>>>>
> >> >>>>>>>>>> CREATE VIEW user_activity
> >> >>>>>>>>>> WATERMARK FOR event_time AS event_time - INTERVAL '5'
> SECONDAS
> >> >>> SELECT
> >> >>>>>>>>>> user_id, event_time, action FROM raw_events;
> >> >>>>>>>>>>
> >> >>>>>>>>>> *Option 2: ALTER VIEW SET WATERMARK*
> >> >>>>>>>>>>
> >> >>>>>>>>>> ALTER VIEW user_activity SET WATERMARK FOR event_time AS
> >> >>> event_time -
> >> >>>>>>>>>> INTERVAL '5' SECOND;
> >> >>>>>>>>>>
> >> >>>>>>>>>> Key Design Aspects
> >> >>>>>>>>>>
> >> >>>>>>>>>>     - *Backward Compatibility*: Watermark stored as optional
> >> >>>>> metadata
> >> >>>>>>>> in
> >> >>>>>>>>>>     view options; existing views continue to work unchanged
> >> >>>>>>>>>>     - *Validation*: Watermark column must exist in view
> schema
> >> >>> and
> >> >>>>> be
> >> >>>>>>>> of
> >> >>>>>>>>>>     TIMESTAMP/TIMESTAMP_LTZ type
> >> >>>>>>>>>>     - *Storage*: Watermark metadata persists in catalog
> options
> >> >>> map
> >> >>>>>>>>> (works
> >> >>>>>>>>>>     with all catalog implementations)
> >> >>>>>>>>>>     - *Propagation*: Follows existing Flink watermark
> >> propagation
> >> >>>>>>> rules
> >> >>>>>>>>> in
> >> >>>>>>>>>>     joins and nested views
> >> >>>>>>>>>>
> >> >>>>>>>>>> Use Case Example: Data Lakehouse Architecture
> >> >>>>>>>>>>
> >> >>>>>>>>>> -- Bronze: Raw data (no watermark)CREATE TABLE bronze_events
> >> >>>>>>> (raw_data
> >> >>>>>>>>>> STRING, ingestion_time TIMESTAMP(3)) WITH (...);
> >> >>>>>>>>>> -- Silver: Cleaned data with watermarkCREATE VIEW
> silver_events
> >> >>>>>>>>>> WATERMARK FOR event_time AS event_time - INTERVAL '10'
> SECONDAS
> >> >>>>>>> SELECT
> >> >>>>>>>>>>      CAST(JSON_VALUE(raw_data, '$.event_time') AS
> TIMESTAMP(3))
> >> >>> AS
> >> >>>>>>>>> event_time,
> >> >>>>>>>>>>      JSON_VALUE(raw_data, '$.user_id') AS user_idFROM
> >> >>>>>>>>>> bronze_eventsWHERE JSON_VALUE(raw_data, '$.event_time') IS
> NOT
> >> >>> NULL;
> >> >>>>>>>>>> -- Gold: AggregationsSELECT TUMBLE_START(event_time, INTERVAL
> >> '1'
> >> >>>>>>>>>> HOUR), COUNT(*)FROM silver_eventsGROUP BY TUMBLE(event_time,
> >> >>> INTERVAL
> >> >>>>>>>>>> '1' HOUR);
> >> >>>>>>>>>>
> >> >>>>>>>>>> Reference Materials
> >> >>>>>>>>>>
> >> >>>>>>>>>>     - FLIP Document: FLIP-XXX: Support Watermark in Flink SQL
> >> >>> View
> >> >>>>>>>>>>     <
> >> >>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>
> >> >>>
> >>
> https://docs.google.com/document/d/1OBGTi3Xb-Kpcf_nHeKA30XiRPEKV4PU7FpieXQhWn7Y/edit?usp=sharing
> >> >>>>>>>>>>
> >> >>>>>>>>>>     - JIRA Issue:
> >> >>> https://issues.apache.org/jira/browse/FLINK-39062
> >> >>>>>>>>>>     - Implementation POC:
> >> >>>>>>>>>>     - [FLINK-39062][table] Support WATERMARK clause in CREATE
> >> >>> VIEW
> >> >>>>>>>>> statement
> >> >>>>>>>>>>     <https://github.com/apache/flink/pull/27571>
> >> >>>>>>>>>>     - [FLINK-39062][table] Support ALTER VIEW SET WATERMARK
> >> >>> syntax
> >> >>>>>>>>>>     <https://github.com/apache/flink/pull/27570>
> >> >>>>>>>>>>
> >> >>>>>>>>>> Implementation Timeline
> >> >>>>>>>>>>
> >> >>>>>>>>>> Estimated 6-8 weeks covering parser layer, planner layer,
> >> catalog
> >> >>>>>>>>>> integration, and comprehensive testing.
> >> >>>>>>>>>> Request for Feedback
> >> >>>>>>>>>>
> >> >>>>>>>>>> This enhancement would significantly improve Flink's support
> for
> >> >>>>>>>> layered
> >> >>>>>>>>>> data architectures and flexible event-time processing. I'm
> happy
> >> >>> to
> >> >>>>>>>>> provide
> >> >>>>>>>>>> more details or start a formal FLIP process if the community
> >> sees
> >> >>>>>>> value
> >> >>>>>>>>> in
> >> >>>>>>>>>> this proposal.
> >> >>>>>>>>>>
> >> >>>>>>>>>> Looking forward to the community's feedback!
> >> >>>>>>>>>>
> >> >>>>>>>>>> Best regards,
> >> >>>>>>>>>>
> >> >>>>>>>>>> Feat Zhang
> >> >>>>>>>>>>
> >> >>>>>>>>>>   FLIP-XXX: Support Watermark in Flink SQL View
> >> >>>>>>>>>> <
> >> >>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>
> >> >>>
> >>
> https://drive.google.com/open?id=1OBGTi3Xb-Kpcf_nHeKA30XiRPEKV4PU7FpieXQhWn7Y
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>
> >> >>>>>
> >> >>>
> >>
>

Re: Re: Re: [DISCUSS] FLIP-XXX: Support Watermark Definition in SQL Views

Reply via email to