[ https://issues.apache.org/jira/browse/FLINK-27251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538730#comment-17538730 ]
Piotr Nowojski commented on FLINK-27251: ---------------------------------------- Thanks [~fanrui] for the update. I will take a look :) {quote} But I don't understand "when that thread is blocked by the timeout, it's queue of requests should be completely empty.", could your share more details? Which thread? Is the ChannelStateWriteThread? {quote} Yes, I meant the {{ChannelStateWriterThread}}. If we are enqueuing timeoutable, but not yet timed out checkpoint barrier on the outputs, it means that we have already received AND processed ALL of the checkpoint barriers on the input channels. In other words, there under any circumstances there won't be need to spill/persist any in-flight data from the outputs for this checkpoints. So if we are blocking the {{ChannelStateWriterThread}} for this subtask with waiting for the future (for checkpoint barriers to timeout on the output or being sent to the downstream task), this {{ChannelStateWriterThread}} doesn't have anything else to do. It doesn't matter if we block it or not. New write requests to this {{ChannelStateWriterThread}} can only happen for a next checkpoint, that won't happen until the current checkpoint completes. > Timeout aligned to unaligned checkpoint barrier in the output buffers of an > upstream subtask > -------------------------------------------------------------------------------------------- > > Key: FLINK-27251 > URL: https://issues.apache.org/jira/browse/FLINK-27251 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing > Affects Versions: 1.14.0, 1.15.0 > Reporter: fanrui > Assignee: fanrui > Priority: Major > Labels: pull-request-available > Fix For: 1.16.0 > > > After FLINK-23041, the downstream task can be switched UC when {_}currentTime > - triggerTime > timeout{_}. But the downstream task still needs wait for all > barriers of upstream. > If the back pressure is serve, the downstream task cannot receive all barrier > within CP timeout, causes CP to fail. > > Can we support upstream Task switching from Aligned to UC? It means that when > the barrier cannot be sent from the output buffer to the downstream task > within the > [execution.checkpointing.aligned-checkpoint-timeout|https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/config/#execution-checkpointing-aligned-checkpoint-timeout], > the upstream task switches to UC and takes a snapshot of the data before the > barrier in the output buffer. > > Hi [~akalashnikov] , please help take a look in your free time, thanks a lot. -- This message was sent by Atlassian Jira (v8.20.7#820007)