[
https://issues.apache.org/jira/browse/FLINK-16581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17064864#comment-17064864
]
Jark Wu edited comment on FLINK-16581 at 3/23/20, 3:08 PM:
-----------------------------------------------------------
Hi [~lsy], thanks for the contribution. I glanced the pull request you
submitted. But I would like to discuss the approach here.
Currently, there are 2 ways to cleanup states.
1) registering a processing-time timer, and cleanup entries when the timer is
callback.
- pros: can cleanup multiple states at the same time (state consistent)
- cons: timer space depends on the key size, which may lead to OOM (heap
timer).
- used in Group Aggregation, Over Aggregateion, TopN
2) using the {{StateTtlConfig}} provided by DataStream.
- pros: decouple the logic of state ttl with the record processing, easy to
program (take a look at old planner NonWindowJoin which bundles ttl timestamp
with records in MapState).
- cons: can't cleanup multiple states at the same time.
- useed in Sream-Stream Joins.
Personally, I perfer using {{StateTtlConfig}} which leverage the ability of
DataStream and not inventing the same thing. Besides, it can help to improve
the readability of codes (reduce bugs). What do you think [~lzljs3620320]
[~lsy]?
was (Author: jark):
Hi [~lsy], thanks for the contribution. I glanced the pull request you
submitted. But I would like to discuss the approach here.
Currently, there are 2 ways to cleanup states.
1) registering a processing-time timer, and cleanup entries when the timer is
callback.
- pros: can cleanup multiple states at the same time (state consistent)
- cons: timer space depends on the key size, which may lead to OOM (heap
timer).
- used in Group Aggregation, Over Aggregateion, TopN
2) using the {{StateTtlConfig}} provided by DataStream.
- pros: decouple the logic of state ttl with the record processing, easy to
program (take a look at old planner NonWindowJoin which bundles ttl timestamp
with records in MapState).
- cons: can't cleanup multiple states at the same time.
- useed in Sream-Stream Joins.
Personally, I perfer using {{StateTtlConfig}} which leverage the ability of
DataStream and not inventing the same thing. Besides, it can help to improve
the readability of codes (reduce bugs). What do you think [~lzljs3620320]
[~lsy]?
> Minibatch deduplication lack state TTL
> --------------------------------------
>
> Key: FLINK-16581
> URL: https://issues.apache.org/jira/browse/FLINK-16581
> Project: Flink
> Issue Type: Bug
> Components: Table SQL / Runtime
> Affects Versions: 1.9.2, 1.10.0
> Reporter: Jingsong Lee
> Assignee: dalongliu
> Priority: Critical
> Fix For: 1.9.3, 1.10.1, 1.11.0
>
>
> This lead to OOM with long running streaming job.
> We should check all unbounded operations, should not lack state TTL
--
This message was sent by Atlassian Jira
(v8.3.4#803005)