[
https://issues.apache.org/jira/browse/FLINK-16581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065782#comment-17065782
]
Jark Wu commented on FLINK-16581:
---------------------------------
Hi [~lzljs3620320], that's true there is a lot of operators using timers.
That's because {{StateTtlConfig}} is introduced in recent releases and we don't
have much time to refactor the existing operators. The reason why we use
{{StateTtlConfig}} is because it simplify the implementatiion A LOT.
> can't cleanup multiple states at the same time
For example, in COUNT DISTINCT, there are 2 states, the {{MapState}} stores all
distinct values, the {{count}} store the size of the MapState. If we use
{{StateTtlConfig}}, some entries of MapState may be retired, but {{count}} is
not. If a retired value comes in, the {{count}} value gets larger by mistake.
If we use timer, MapState and count will be reset together.
But I think that's not a big problem, because the result is anyway not correct
once ttl happens.
in {{RetractableTopNFunction}}, there will be multiple states, the
{{dataState}} which stores all input data, the {{treeMap}} stores the TopN
element in order.
> Minibatch deduplication lack state TTL
> --------------------------------------
>
> Key: FLINK-16581
> URL: https://issues.apache.org/jira/browse/FLINK-16581
> Project: Flink
> Issue Type: Bug
> Components: Table SQL / Runtime
> Affects Versions: 1.9.2, 1.10.0
> Reporter: Jingsong Lee
> Assignee: dalongliu
> Priority: Critical
> Fix For: 1.9.3, 1.10.1, 1.11.0
>
>
> This lead to OOM with long running streaming job.
> We should check all unbounded operations, should not lack state TTL
--
This message was sent by Atlassian Jira
(v8.3.4#803005)