Shaoxuan Wang created FLINK-6216:
------------------------------------

             Summary: DataStream unbounded groupby aggregate with early firing
                 Key: FLINK-6216
                 URL: https://issues.apache.org/jira/browse/FLINK-6216
             Project: Flink
          Issue Type: New Feature
          Components: Table API & SQL
            Reporter: Shaoxuan Wang
            Assignee: Shaoxuan Wang


Groupby aggregate results in a replace table. For infinite groupby aggregate, 
we need a mechanism to define when the data should be emitted (early-fired). 
This task is aimed to implement the initial version of unbounded groupby 
aggregate, where we update and emit aggregate value per each arrived record. In 
the future, we will implement the mechanism and interface to let user define 
the frequency/period of early-firing the unbounded groupby aggregation results.

The limit space of backend state is one of major obstacles for supporting 
unbounded groupby aggregate in practical. Due to this reason, we suggest two 
common (and very useful) use-cases of this unbounded groupby aggregate:
1. The range of grouping key is limit. In this case, a new arrival record will 
either insert to state as new record or replace the existing record in the 
backend state. The data in the backend state will not be evicted if the 
resource is properly provisioned by the user, such that we can provision the 
correctness on aggregation results.
2. When the grouping key is unlimited, we will not be able ensure the 100% 
correctness of "unbounded groupby aggregate". In this case, we will reply on 
the TTL mechanism of the RocksDB backend state to evicted old data such that we 
can provision the correct results in a certain time range.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to