[ https://issues.apache.org/jira/browse/FLINK-27251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534356#comment-17534356 ]
fanrui commented on FLINK-27251: -------------------------------- Hi [~pnowojski] , thanks for your reply and some useful information. First of all, I think the timeout is an important feature for UC. When the backpressure is not severe, the in-flight data buffer size is small. But we know from FLINK-26803 : Flink will write a file even if the entire ResultPartition has only one buffer. Therefore, a large number of small files will be generated even if the back pressure is not severe. Currently I think the biggest cost of Unaligned Checkpoint compared to Aligned Checkpoint is writing a lot of small files. Without this problem, I think UC can be used as the default option instead of Aligned Checkpoint. Secondly, I think other costs are acceptable, such as the restore time. When the backpressure is not severe, the recovery should be fast. When the backpressure is severe, the slow recovery is better than the CP failure, so the UC is useful. Do you think there are other costs? Sort out these costs can explain why timeout=0 is not suitable for mass production. To summarize, I think when timeout=0 and backpressure is not severe, there will also be a lot of small files. Although the total state size is not large, small files are not very friendly to hdfs. I think the timeout is an important feature for UC, we just use UC when necessary(the backpressure is severe). For code implement, I totally agree with you, It's difficult. It might have some bugs. Actually, I finished the POC, it works well for our test job. When timeout=30s and the back pressure is severe, the UC takes more than 5 minutes for the community version. With this feature, UC is always done within 35s. Let me try to express my idea clearly. * When broadcasting Barrier, if DataType is TIMEOUTABLE_ALIGNED_CHECKPOINT_BARRIER, register timer. ** The time when the timer is triggered is: CP triggerTime + alignmentTimeout ** Behavior when the timer is triggered: snapshot all buffers that have not been sent to the downstream before the Barrier * If the UC is enabled and the Aligned Barrier is received, the CP cannot succeed immediately and needs to wait for the outputBufferFuture: ** When all barriers are sent downstream quickly, and an empty output buffer is returned, the CP ends.(when the back pressure isn't severe) ** Or wait for the timer to trigger, snapshot these output buffers before the barrier, and the CP ends.(when the back pressure is severe) I will submit the code asap. Please correct me if I'm wrong. Thanks. > Timeout aligned to unaligned checkpoint barrier in the output buffers of an > upstream subtask > -------------------------------------------------------------------------------------------- > > Key: FLINK-27251 > URL: https://issues.apache.org/jira/browse/FLINK-27251 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing > Affects Versions: 1.14.0, 1.15.0 > Reporter: fanrui > Priority: Major > Fix For: 1.16.0 > > > After FLINK-23041, the downstream task can be switched UC when {_}currentTime > - triggerTime > timeout{_}. But the downstream task still needs wait for all > barriers of upstream. > If the back pressure is serve, the downstream task cannot receive all barrier > within CP timeout, causes CP to fail. > > Can we support upstream Task switching from Aligned to UC? It means that when > the barrier cannot be sent from the output buffer to the downstream task > within the > [execution.checkpointing.aligned-checkpoint-timeout|https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/config/#execution-checkpointing-aligned-checkpoint-timeout], > the upstream task switches to UC and takes a snapshot of the data before the > barrier in the output buffer. > > Hi [~akalashnikov] , please help take a look in your free time, thanks a lot. -- This message was sent by Atlassian Jira (v8.20.7#820007)