[ 
https://issues.apache.org/jira/browse/FLINK-27251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534356#comment-17534356
 ] 

fanrui commented on FLINK-27251:
--------------------------------

Hi [~pnowojski] , thanks for your reply and some useful information.

First of all, I think the timeout is an important feature for UC. When the 
backpressure is not severe, the in-flight data buffer size is small. But we 
know from FLINK-26803 : Flink will write a file even if the entire 
ResultPartition has only one buffer. Therefore, a large number of small files 
will be generated even if the back pressure is not severe.

Currently I think the biggest cost of Unaligned Checkpoint compared to Aligned 
Checkpoint is writing a lot of small files. Without this problem, I think UC 
can be used as the default option instead of Aligned Checkpoint.

Secondly, I think other costs are acceptable, such as the restore time. When 
the backpressure is not severe, the recovery should be fast. When the 
backpressure is severe, the slow recovery is better than the CP failure, so the 
UC is useful.

Do you think there are other costs? Sort out these costs can explain why 
timeout=0 is not suitable for mass production.

To summarize, I think when timeout=0 and backpressure is not severe, there will 
also be a lot of small files. Although the total state size is not large, small 
files are not very friendly to hdfs. I think the timeout is an important 
feature for UC, we just use UC when necessary(the backpressure is severe).

 

 

 

For code implement, I totally agree with you, It's difficult. It might have 
some bugs. Actually, I finished the POC, it works well for our test job. When 
timeout=30s and the back pressure is severe, the UC takes more than 5 minutes 
for the community version. With this feature, UC is always done within 35s.

Let me try to express my idea clearly.
 * When broadcasting Barrier, if DataType is 
TIMEOUTABLE_ALIGNED_CHECKPOINT_BARRIER, register timer.
 ** The time when the timer is triggered is: CP triggerTime + alignmentTimeout
 ** Behavior when the timer is triggered: snapshot all buffers that have not 
been sent to the downstream before the Barrier
 * If the UC is enabled and the Aligned Barrier is received, the CP cannot 
succeed immediately and needs to wait for the outputBufferFuture:
 ** When all barriers are sent downstream quickly, and an empty output buffer 
is returned, the CP ends.(when the back pressure isn't severe)
 ** Or wait for the timer to trigger, snapshot these output buffers before the 
barrier, and the CP ends.(when the back pressure is severe)

I will submit the code asap. Please correct me if I'm wrong. Thanks.

> Timeout aligned to unaligned checkpoint barrier in the output buffers of an 
> upstream subtask
> --------------------------------------------------------------------------------------------
>
>                 Key: FLINK-27251
>                 URL: https://issues.apache.org/jira/browse/FLINK-27251
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.14.0, 1.15.0
>            Reporter: fanrui
>            Priority: Major
>             Fix For: 1.16.0
>
>
> After FLINK-23041, the downstream task can be switched UC when {_}currentTime 
> - triggerTime > timeout{_}. But the downstream task still needs wait for all 
> barriers of upstream. 
> If the back pressure is serve, the downstream task cannot receive all barrier 
> within CP timeout, causes CP to fail.
>  
> Can we support upstream Task switching from Aligned to UC? It means that when 
> the barrier cannot be sent from the output buffer to the downstream task 
> within the 
> [execution.checkpointing.aligned-checkpoint-timeout|https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/config/#execution-checkpointing-aligned-checkpoint-timeout],
>  the upstream task switches to UC and takes a snapshot of the data before the 
> barrier in the output buffer.
>  
> Hi [~akalashnikov] , please help take a look in your free time, thanks a lot.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to