[ 
https://issues.apache.org/jira/browse/FLINK-10886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-10886:
-----------------------------------
    Labels: stale-assigned  (was: )

I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help 
the community manage its development. I see this issue is assigned but has not 
received an update in 14, so it has been labeled "stale-assigned".
If you are still working on the issue, please remove the label and add a 
comment updating the community on your progress.  If this issue is waiting on 
feedback, please consider this a reminder to the committer/reviewer. Flink is a 
very active project, and so we appreciate your patience.
If you are no longer working on the issue, please unassign yourself so someone 
else may work on it. If the "warning_label" label is not removed in 7 days, the 
issue will be automatically unassigned.


> Event time synchronization across sources
> -----------------------------------------
>
>                 Key: FLINK-10886
>                 URL: https://issues.apache.org/jira/browse/FLINK-10886
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / Common
>            Reporter: Jamie Grier
>            Assignee: Jamie Grier
>            Priority: Major
>              Labels: stale-assigned
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> When reading from a source with many parallel partitions, especially when 
> reading lots of historical data (or recovering from downtime and there is a 
> backlog to read), it's quite common for there to develop an event-time skew 
> across those partitions.
>  
> When doing event-time windowing -- or in fact any event-time driven 
> processing -- the event time skew across partitions results directly in 
> increased buffering in Flink and of course the corresponding state/checkpoint 
> size growth.
>  
> As the event-time skew and state size grows larger this can have a major 
> effect on application performance and in some cases result in a "death 
> spiral" where the application performance get's worse and worse as the state 
> size grows and grows.
>  
> So, one solution to this problem, outside of core changes in Flink itself, 
> seems to be to try to coordinate sources across partitions so that they make 
> progress through event time at roughly the same rate.  In fact if there is 
> large skew the idea would be to slow or even stop reading from some 
> partitions with newer data while first reading the partitions with older 
> data.  Anyway, to do this we need to share state somehow amongst sub-tasks.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to