[jira] [Commented] (FLINK-11172) Remove the max retention time in StreamQueryConfig

Yangze Guo (JIRA) Mon, 17 Dec 2018 01:04:47 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-11172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722795#comment-16722795
 ]


Yangze Guo commented on FLINK-11172:
------------------------------------

I agree with [~hequn8128] that we can replace the logic with TtlState. But I 
think that should be a long-term goal and unable to achive in the near future.

Below is my concern:
 * If State TTL is enabled, consumption of state storage will be increased
 * Currently, expired values are only removed when they are read out 
explicitly. If we use TtlState, we should implement eager deletion mode first.
 * Other limitations in State TTL mentioned 
[there|https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/state/state.html#state-time-to-live-ttl]
 may spread to the operators support by StreamQueryConfig.

In my opinion, we can fix it first and treat the migration as a long-term plan.

Besides, we need to involve [~azagrebin] into the discussion since there are 
some improvements on State TTL in progress we need take into account, e.g. 
[FLINK-10473|https://issues.apache.org/jira/browse/FLINK-10473].

> Remove the max retention time in StreamQueryConfig
> --------------------------------------------------
>
>                 Key: FLINK-11172
>                 URL: https://issues.apache.org/jira/browse/FLINK-11172
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table API &amp; SQL
>    Affects Versions: 1.8.0
>            Reporter: Yangze Guo
>            Assignee: Yangze Guo
>            Priority: Major
>
> [Stream Query 
> Config|https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming/query_configuration.html]
>  is an important and useful feature to make a tradeoff between accuracy and 
> resource consumption when some query executed in unbounded streaming data. 
> This feature first proposed in 
> [FLINK-6491|https://issues.apache.org/jira/browse/FLINK-6491].
> At the first, *QueryConfig* take two parameters, i.e. 
> minIdleStateRetentionTime and maxIdleStateRetentionTime, to avoid to register 
> many timers if we have more freedom when to discard state. However, this 
> approach may cause new data expired earlier than old data and thus greater 
> accuracy loss appeared in some case. For example, we have an unbounded keyed 
> streaming data. We process key *_a_* in _*t0*_ and _*b*_ in _*t1,*_ *_t0 < 
> t1_*.  *_a_* will expired in _*a+maxIdleStateRetentionTime*_ while _*b*_ 
> expired in *_b+maxIdleStateRetentionTime_*. Now, another data with key *_a_* 
> arrived in _*t2 (t1 < t2)*_. But _*t2+minIdleStateRetentionTime*_ <  
> _*a+maxIdleStateRetentionTime*_. The state of key *_a_* will still be expired 
> in _*a+maxIdleStateRetentionTime*_ which is early than the state of key 
> _*b*_. According to the guideline of 
> [LRU|https://en.wikipedia.org/wiki/Cache_replacement_policies#Least_recently_used_(LRU)]
>  that the element has been most heavily used in the past few instructions are 
> most likely to be used heavily in the next few instructions too. The state 
> with key _*a*_ should live longer than the state with key _*b*_. Current 
> approach against this idea.
> I think we now have a good chance to remove the maxIdleStateRetentionTime 
> argument in *StreamQueryConfig.* Below are my reasons.
>  * [FLINK-9423|https://issues.apache.org/jira/browse/FLINK-9423] implement 
> efficient deletes for heap-based timer service. We can leverage the deletion 
> op to mitigate the abuse of timer registration.
>  * Current approach can cause new data expired earlier than old data and thus 
> greater accuracy loss appeared in some case. Users need to fine-tune these 
> two parameter to avoid this scenario. Directly following the idea of LRU 
> looks like a better solution.
> So, I plan to remove maxIdleStateRetentionTime, update the expire time only 
> depends on  _*minIdleStateRetentionTime.*_
> cc to [~sunjincheng121], [~fhueske] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-11172) Remove the max retention time in StreamQueryConfig

Reply via email to