[ https://issues.apache.org/jira/browse/FLINK-11172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Flink Jira Bot updated FLINK-11172: ----------------------------------- Labels: auto-deprioritized-major auto-unassigned (was: auto-unassigned stale-major) Priority: Minor (was: Major) This issue was labeled "stale-major" 7 ago and has not received any updates so it is being deprioritized. If this ticket is actually Major, please raise the priority and ask a committer to assign you the issue or revive the public discussion. > Remove the max retention time in StreamQueryConfig > -------------------------------------------------- > > Key: FLINK-11172 > URL: https://issues.apache.org/jira/browse/FLINK-11172 > Project: Flink > Issue Type: Improvement > Components: Table SQL / API > Affects Versions: 1.8.0 > Reporter: Yangze Guo > Priority: Minor > Labels: auto-deprioritized-major, auto-unassigned > > [Stream Query > Config|https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming/query_configuration.html] > is an important and useful feature to make a tradeoff between accuracy and > resource consumption when some query executed in unbounded streaming data. > This feature first proposed in > [FLINK-6491|https://issues.apache.org/jira/browse/FLINK-6491]. > At the first, *QueryConfig* take two parameters, i.e. > minIdleStateRetentionTime and maxIdleStateRetentionTime, to avoid to register > many timers if we have more freedom when to discard state. However, this > approach may cause new data expired earlier than old data and thus greater > accuracy loss appeared in some case. For example, we have an unbounded keyed > streaming data. We process key *_a_* in _*t0*_ and _*b*_ in _*t1,*_ *_t0 < > t1_*. *_a_* will expired in _*a+maxIdleStateRetentionTime*_ while _*b*_ > expired in *_b+maxIdleStateRetentionTime_*. Now, another data with key *_a_* > arrived in _*t2 (t1 < t2)*_. But _*t2+minIdleStateRetentionTime*_ < > _*a+maxIdleStateRetentionTime*_. The state of key *_a_* will still be expired > in _*a+maxIdleStateRetentionTime*_ which is early than the state of key > _*b*_. According to the guideline of > [LRU|https://en.wikipedia.org/wiki/Cache_replacement_policies#Least_recently_used_(LRU)] > that the element has been most heavily used in the past few instructions are > most likely to be used heavily in the next few instructions too. The state > with key _*a*_ should live longer than the state with key _*b*_. Current > approach against this idea. > I think we now have a good chance to remove the maxIdleStateRetentionTime > argument in *StreamQueryConfig.* Below are my reasons. > * [FLINK-9423|https://issues.apache.org/jira/browse/FLINK-9423] implement > efficient deletes for heap-based timer service. We can leverage the deletion > op to mitigate the abuse of timer registration. > * Current approach can cause new data expired earlier than old data and thus > greater accuracy loss appeared in some case. Users need to fine-tune these > two parameter to avoid this scenario. Directly following the idea of LRU > looks like a better solution. > So, I plan to remove maxIdleStateRetentionTime, update the expire time only > depends on _*minIdleStateRetentionTime.*_ > cc to [~sunjincheng121], [~fhueske] -- This message was sent by Atlassian Jira (v8.3.4#803005)