[ 
https://issues.apache.org/jira/browse/FLINK-39030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-39030:
-----------------------------------
    Labels: pull-request-available  (was: )

> Support emit-only-on-update mode for CUMULATE window TVF to reduce 
> unnecessary outputs
> --------------------------------------------------------------------------------------
>
>                 Key: FLINK-39030
>                 URL: https://issues.apache.org/jira/browse/FLINK-39030
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / API
>            Reporter: Liu
>            Priority: Major
>              Labels: pull-request-available
>
> h1. Summary
> Currently, the CUMULATE window TVF (Table-Valued Function) triggers output at 
> every step interval, regardless of whether new data arrived during that 
> period. This behavior leads to unnecessary repeated outputs and increased 
> downstream processing overhead in scenarios where data arrives sparsely.
> This proposal introduces an optional emit-only-on-update mode for CUMULATE 
> windows that only emits results when new data actually arrives within a step 
> interval.
> h1. Motivation
> Consider a CUMULATE window with maxSize=15 seconds and step=5 seconds, 
> starting at 00:00:00:
>  
> ||Window||Time range||Triggered at||
> |1|[00:00:00, 00:00:05)|00:00:05|
> |2|[00:00:00, 00:00:10)|00:00:10|
> |3|[00:00:00, 00:00:15)|00:00:15|
> *Problem:* If data only arrives at 00:00:02, the current implementation still 
> emits three outputs (at 00:00:05, 00:00:10, and 00:00:15), even though 
> windows 2 and 3 contain no new data compared to window 1.
> *Impact:*
>  
>  * Unnecessary computational overhead
>  * Increased network I/O for downstream operators
>  * Higher storage costs when persisting results
>  * Significant performance degradation in scenarios with sparse data 
> distribution
> h1. Proposed Solution
> Add an optional 6th parameter emit_on_update_only (BOOLEAN, default false) to 
> the CUMULATE TVF:
> {code:java}
> CUMULATE(
>     TABLE data_table,
>     DESCRIPTOR(rowtime),
>     INTERVAL '5' SECOND,    -- step
>     INTERVAL '15' SECOND,   -- maxSize  
>     INTERVAL '0' SECOND,    -- offset
>     true                    -- emit_on_update_only (NEW)
> ) {code}
> Behavior when emit_on_update_only = true:
>  * Window output is only triggered when new data arrives within the current 
> step interval
>  * If no new data arrives, the window silently advances without emitting 
> results
>  * This is particularly useful for sparse data streams where most step 
> intervals have no data
> Behavior when emit_on_update_only = false (default):
>  * Current behavior is preserved (output at every step interval)
>  * Ensures backward compatibility



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to