[
https://issues.apache.org/jira/browse/FLINK-27251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534356#comment-17534356
]
fanrui commented on FLINK-27251:
--------------------------------
Hi [~pnowojski] , thanks for your reply and some useful information.
First of all, I think the timeout is an important feature for UC. When the
backpressure is not severe, the in-flight data buffer size is small. But we
know from FLINK-26803 : Flink will write a file even if the entire
ResultPartition has only one buffer. Therefore, a large number of small files
will be generated even if the back pressure is not severe.
Currently I think the biggest cost of Unaligned Checkpoint compared to Aligned
Checkpoint is writing a lot of small files. Without this problem, I think UC
can be used as the default option instead of Aligned Checkpoint.
Secondly, I think other costs are acceptable, such as the restore time. When
the backpressure is not severe, the recovery should be fast. When the
backpressure is severe, the slow recovery is better than the CP failure, so the
UC is useful.
Do you think there are other costs? Sort out these costs can explain why
timeout=0 is not suitable for mass production.
To summarize, I think when timeout=0 and backpressure is not severe, there will
also be a lot of small files. Although the total state size is not large, small
files are not very friendly to hdfs. I think the timeout is an important
feature for UC, we just use UC when necessary(the backpressure is severe).
For code implement, I totally agree with you, It's difficult. It might have
some bugs. Actually, I finished the POC, it works well for our test job. When
timeout=30s and the back pressure is severe, the UC takes more than 5 minutes
for the community version. With this feature, UC is always done within 35s.
Let me try to express my idea clearly.
* When broadcasting Barrier, if DataType is
TIMEOUTABLE_ALIGNED_CHECKPOINT_BARRIER, register timer.
** The time when the timer is triggered is: CP triggerTime + alignmentTimeout
** Behavior when the timer is triggered: snapshot all buffers that have not
been sent to the downstream before the Barrier
* If the UC is enabled and the Aligned Barrier is received, the CP cannot
succeed immediately and needs to wait for the outputBufferFuture:
** When all barriers are sent downstream quickly, and an empty output buffer
is returned, the CP ends.(when the back pressure isn't severe)
** Or wait for the timer to trigger, snapshot these output buffers before the
barrier, and the CP ends.(when the back pressure is severe)
I will submit the code asap. Please correct me if I'm wrong. Thanks.
> Timeout aligned to unaligned checkpoint barrier in the output buffers of an
> upstream subtask
> --------------------------------------------------------------------------------------------
>
> Key: FLINK-27251
> URL: https://issues.apache.org/jira/browse/FLINK-27251
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Checkpointing
> Affects Versions: 1.14.0, 1.15.0
> Reporter: fanrui
> Priority: Major
> Fix For: 1.16.0
>
>
> After FLINK-23041, the downstream task can be switched UC when {_}currentTime
> - triggerTime > timeout{_}. But the downstream task still needs wait for all
> barriers of upstream.
> If the back pressure is serve, the downstream task cannot receive all barrier
> within CP timeout, causes CP to fail.
>
> Can we support upstream Task switching from Aligned to UC? It means that when
> the barrier cannot be sent from the output buffer to the downstream task
> within the
> [execution.checkpointing.aligned-checkpoint-timeout|https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/config/#execution-checkpointing-aligned-checkpoint-timeout],
> the upstream task switches to UC and takes a snapshot of the data before the
> barrier in the output buffer.
>
> Hi [~akalashnikov] , please help take a look in your free time, thanks a lot.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)