[ 
https://issues.apache.org/jira/browse/FLINK-11172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722823#comment-16722823
 ] 

Timo Walther commented on FLINK-11172:
--------------------------------------

As far as I know eager TTL State is on the short-term roadmap. If this is the 
case, we don't need to put effort into a feature that might be obsolete soon. 
It would also solve the inconsistent use of Stream Query Config for different 
operators. Currently, the stream query config is not used for windows and CEP. 
With proper TTL state, we would also unify the behavior between Table API and 
DataStream API.

> Remove the max retention time in StreamQueryConfig
> --------------------------------------------------
>
>                 Key: FLINK-11172
>                 URL: https://issues.apache.org/jira/browse/FLINK-11172
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table API & SQL
>    Affects Versions: 1.8.0
>            Reporter: Yangze Guo
>            Assignee: Yangze Guo
>            Priority: Major
>
> [Stream Query 
> Config|https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming/query_configuration.html]
>  is an important and useful feature to make a tradeoff between accuracy and 
> resource consumption when some query executed in unbounded streaming data. 
> This feature first proposed in 
> [FLINK-6491|https://issues.apache.org/jira/browse/FLINK-6491].
> At the first, *QueryConfig* take two parameters, i.e. 
> minIdleStateRetentionTime and maxIdleStateRetentionTime, to avoid to register 
> many timers if we have more freedom when to discard state. However, this 
> approach may cause new data expired earlier than old data and thus greater 
> accuracy loss appeared in some case. For example, we have an unbounded keyed 
> streaming data. We process key *_a_* in _*t0*_ and _*b*_ in _*t1,*_ *_t0 < 
> t1_*.  *_a_* will expired in _*a+maxIdleStateRetentionTime*_ while _*b*_ 
> expired in *_b+maxIdleStateRetentionTime_*. Now, another data with key *_a_* 
> arrived in _*t2 (t1 < t2)*_. But _*t2+minIdleStateRetentionTime*_ <  
> _*a+maxIdleStateRetentionTime*_. The state of key *_a_* will still be expired 
> in _*a+maxIdleStateRetentionTime*_ which is early than the state of key 
> _*b*_. According to the guideline of 
> [LRU|https://en.wikipedia.org/wiki/Cache_replacement_policies#Least_recently_used_(LRU)]
>  that the element has been most heavily used in the past few instructions are 
> most likely to be used heavily in the next few instructions too. The state 
> with key _*a*_ should live longer than the state with key _*b*_. Current 
> approach against this idea.
> I think we now have a good chance to remove the maxIdleStateRetentionTime 
> argument in *StreamQueryConfig.* Below are my reasons.
>  * [FLINK-9423|https://issues.apache.org/jira/browse/FLINK-9423] implement 
> efficient deletes for heap-based timer service. We can leverage the deletion 
> op to mitigate the abuse of timer registration.
>  * Current approach can cause new data expired earlier than old data and thus 
> greater accuracy loss appeared in some case. Users need to fine-tune these 
> two parameter to avoid this scenario. Directly following the idea of LRU 
> looks like a better solution.
> So, I plan to remove maxIdleStateRetentionTime, update the expire time only 
> depends on  _*minIdleStateRetentionTime.*_
> cc to [~sunjincheng121], [~fhueske] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to