[ 
https://issues.apache.org/jira/browse/FLINK-17099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dalongliu updated FLINK-17099:
------------------------------
    Description: 
At the moment, there are 2 ways to cleanup states.

1) registering a processing-time timer, and cleanup entries when the timer is 
callback.
 - pros: can cleanup multiple states at the same time (state consistent)
 - cons: timer space depends on the key size, which may lead to OOM (heap 
timer).
 - used in Group Aggregation, Over Aggregation, TopN

2) using the {{StateTtlConfig}} provided by DataStream [1].
 - pros: decouple the logic of state ttl with the record processing, easy to 
program (take a look at old planner NonWindowJoin which bundles ttl timestamp 
with records in MapState).
 - cons: can't cleanup multiple states at the same time.
 - useed in Sream-Stream Joins.

For timer solution, although it can cleanup multiple states at the same time, 
but it also will lead to OOM when there have a great many state keys, besides, 
StateTtlConfig is used in stream-stream join case, and will be used in more 
operator. Therefore,in order to unify the state ttl solution, simplify the code 
implemention, and improve the readability of codes, so we should refactor state 
cleanup way which use StateTtlConfig to replace processing-time timer in Group 
Aggregation、Deduplication、TopN operators.

  was:
At the moment, there are 2 ways to cleanup states.

1) registering a processing-time timer, and cleanup entries when the timer is 
callback.
 - pros: can cleanup multiple states at the same time (state consistent)
 - cons: timer space depends on the key size, which may lead to OOM (heap 
timer).
 - used in Group Aggregation, Over Aggregation, TopN

2) using the {{StateTtlConfig}} provided by DataStream [1].
 - pros: decouple the logic of state ttl with the record processing, easy to 
program (take a look at old planner NonWindowJoin which bundles ttl timestamp 
with records in MapState).
 - cons: can't cleanup multiple states at the same time.
 - useed in Sream-Stream Joins.

For timer solution, although it can cleanup multiple states at the same time, 
but it also will lead to OOM when there have a great many state keys, besides, 
StateTtlConfig is used in stream-stream join case, and will be used in more 
operator. Therefore,in order to unify the state ttl solution, simplify the code 
implemention, and improve the readability of codes, so we should refactor state 
cleanup way which use StateTtlConfig to replace processing-time timer in Group 
Aggregation、Over Aggregation、TopN operator, etc.


> Refactoring State TTL solution in Group Agg、Deduplication、TopN operator 
> replace Timer with StateTtlConfig
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-17099
>                 URL: https://issues.apache.org/jira/browse/FLINK-17099
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / Runtime
>    Affects Versions: 1.9.0, 1.10.0
>            Reporter: dalongliu
>            Assignee: dalongliu
>            Priority: Major
>             Fix For: 1.11.0
>
>
> At the moment, there are 2 ways to cleanup states.
> 1) registering a processing-time timer, and cleanup entries when the timer is 
> callback.
>  - pros: can cleanup multiple states at the same time (state consistent)
>  - cons: timer space depends on the key size, which may lead to OOM (heap 
> timer).
>  - used in Group Aggregation, Over Aggregation, TopN
> 2) using the {{StateTtlConfig}} provided by DataStream [1].
>  - pros: decouple the logic of state ttl with the record processing, easy to 
> program (take a look at old planner NonWindowJoin which bundles ttl timestamp 
> with records in MapState).
>  - cons: can't cleanup multiple states at the same time.
>  - useed in Sream-Stream Joins.
> For timer solution, although it can cleanup multiple states at the same time, 
> but it also will lead to OOM when there have a great many state keys, 
> besides, StateTtlConfig is used in stream-stream join case, and will be used 
> in more operator. Therefore,in order to unify the state ttl solution, 
> simplify the code implemention, and improve the readability of codes, so we 
> should refactor state cleanup way which use StateTtlConfig to replace 
> processing-time timer in Group Aggregation、Deduplication、TopN operators.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to