Liu created FLINK-39030:
---------------------------

             Summary: Support emit-only-on-update mode for CUMULATE window TVF 
to reduce unnecessary outputs
                 Key: FLINK-39030
                 URL: https://issues.apache.org/jira/browse/FLINK-39030
             Project: Flink
          Issue Type: Improvement
          Components: Table SQL / API
            Reporter: Liu


h1. Summary

Currently, the CUMULATE window TVF (Table-Valued Function) triggers output at 
every step interval, regardless of whether new data arrived during that period. 
This behavior leads to unnecessary repeated outputs and increased downstream 
processing overhead in scenarios where data arrives sparsely.
This proposal introduces an optional emit-only-on-update mode for CUMULATE 
windows that only emits results when new data actually arrives within a step 
interval.
h1. Motivation

Consider a CUMULATE window with maxSize=15 seconds and step=5 seconds, starting 
at 00:00:00:

 
||Window||Time range||Triggered at||
|1|[00:00:00, 00:00:05)|00:00:05|
|2|[00:00:00, 00:00:10)|00:00:10|
|3|[00:00:00, 00:00:15)|00:00:15|


*Problem:* If data only arrives at 00:00:02, the current implementation still 
emits three outputs (at 00:00:05, 00:00:10, and 00:00:15), even though windows 
2 and 3 contain no new data compared to window 1.
*Impact:*

 
 * Unnecessary computational overhead
 * Increased network I/O for downstream operators
 * Higher storage costs when persisting results
 * Significant performance degradation in scenarios with sparse data 
distribution

h1. Proposed Solution

Add an optional 6th parameter emit_on_update_only (BOOLEAN, default false) to 
the CUMULATE TVF:
{code:java}
CUMULATE(
    TABLE data_table,
    DESCRIPTOR(rowtime),
    INTERVAL '5' SECOND,    -- step
    INTERVAL '15' SECOND,   -- maxSize  
    INTERVAL '0' SECOND,    -- offset
    true                    -- emit_on_update_only (NEW)
) {code}
Behavior when emit_on_update_only = true:
 * Window output is only triggered when new data arrives within the current 
step interval
 * If no new data arrives, the window silently advances without emitting results
 * This is particularly useful for sparse data streams where most step 
intervals have no data

Behavior when emit_on_update_only = false (default):
 * Current behavior is preserved (output at every step interval)
 * Ensures backward compatibility



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to