Shaoxuan Wang created FLINK-6216: ------------------------------------ Summary: DataStream unbounded groupby aggregate with early firing Key: FLINK-6216 URL: https://issues.apache.org/jira/browse/FLINK-6216 Project: Flink Issue Type: New Feature Components: Table API & SQL Reporter: Shaoxuan Wang Assignee: Shaoxuan Wang
Groupby aggregate results in a replace table. For infinite groupby aggregate, we need a mechanism to define when the data should be emitted (early-fired). This task is aimed to implement the initial version of unbounded groupby aggregate, where we update and emit aggregate value per each arrived record. In the future, we will implement the mechanism and interface to let user define the frequency/period of early-firing the unbounded groupby aggregation results. The limit space of backend state is one of major obstacles for supporting unbounded groupby aggregate in practical. Due to this reason, we suggest two common (and very useful) use-cases of this unbounded groupby aggregate: 1. The range of grouping key is limit. In this case, a new arrival record will either insert to state as new record or replace the existing record in the backend state. The data in the backend state will not be evicted if the resource is properly provisioned by the user, such that we can provision the correctness on aggregation results. 2. When the grouping key is unlimited, we will not be able ensure the 100% correctness of "unbounded groupby aggregate". In this case, we will reply on the TTL mechanism of the RocksDB backend state to evicted old data such that we can provision the correct results in a certain time range. -- This message was sent by Atlassian JIRA (v6.3.15#6346)