[
https://issues.apache.org/jira/browse/FLINK-18050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17128958#comment-17128958
]
Zhijiang edited comment on FLINK-18050 at 6/9/20, 7:36 AM:
-----------------------------------------------------------
Merged in master:
ed7b0b1bea84a10ee45d10343f239cd183659a74,
f2dd4b8500a82532dae17087c227ce34e1aeac9b
Merged in release-1.11:
a233c0ff82273ca59bb1decdb1ffb6020d27ccfd,
822e01b613b0b6821383f3cd5b0357054242b6a9
was (Author: zjwang):
Merged in master: ed7b0b1bea84a10ee45d10343f239cd183659a74,
f2dd4b8500a82532dae17087c227ce34e1aeac9b
Merged in release-1.11: a233c0ff82273ca59bb1decdb1ffb6020d27ccfd,
822e01b613b0b6821383f3cd5b0357054242b6a9
> Fix the bug of recycling buffer twice once exception in
> ChannelStateWriteRequestDispatcher#dispatch
> ---------------------------------------------------------------------------------------------------
>
> Key: FLINK-18050
> URL: https://issues.apache.org/jira/browse/FLINK-18050
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing
> Affects Versions: 1.11.0
> Reporter: Zhijiang
> Assignee: Roman Khachatryan
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 1.11.0, 1.12.0
>
>
> When task finishes, the `CheckpointBarrierUnaligner` will decline the current
> checkpoint, which would write abort request into `ChannelStateWriter`.
> The abort request will be executed before other write output request in the
> queue, and close the underlying `CheckpointStateOutputStream`. Then when the
> dispatcher executes the next write output request to access the stream, it
> will throw ClosedByInterruptException to make dispatcher thread exit.
> In this process, the underlying buffers for current write output request will
> be recycled twice.
> * ChannelStateCheckpointWriter#write will recycle all the buffers in finally
> part, which can cover both exception and normal cases.
> * ChannelStateWriteRequestDispatcherImpl#dispatch will call
> `request.cancel(e)` to recycle the underlying buffers again in the case of
> exception.
> The effect of this bug can cause further exception in the network shuffle
> process, which references the same buffer as above, then this exception will
> send to the downstream side to make it failure.
>
> This bug can be reproduced easily via running
> UnalignedCheckpointITCase#shouldPerformUnalignedCheckpointOnParallelRemoteChannel.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)