[
https://issues.apache.org/jira/browse/FLINK-18050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated FLINK-18050:
-----------------------------------
Labels: pull-request-available (was: )
> Fix the bug of recycling buffer twice once exception in
> ChannelStateWriteRequestDispatcher#dispatch
> ---------------------------------------------------------------------------------------------------
>
> Key: FLINK-18050
> URL: https://issues.apache.org/jira/browse/FLINK-18050
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Checkpointing
> Affects Versions: 1.11.0
> Reporter: Zhijiang
> Assignee: Roman Khachatryan
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 1.11.0, 1.12.0
>
>
> When task finishes, the `CheckpointBarrierUnaligner` will decline the current
> checkpoint, which would write abort request into `ChannelStateWriter`.
> The abort request will be executed before other write output request in the
> queue, and close the underlying `CheckpointStateOutputStream`. Then when the
> dispatcher executes the next write output request to access the stream, it
> will throw ClosedByInterruptException to make dispatcher thread exit.
> In this process, the underlying buffers for current write output request will
> be recycled twice.
> * ChannelStateCheckpointWriter#write will recycle all the buffers in finally
> part, which can cover both exception and normal cases.
> * ChannelStateWriteRequestDispatcherImpl#dispatch will call
> `request.cancel(e)` to recycle the underlying buffers again in the case of
> exception.
> The effect of this bug can cause further exception in the network shuffle
> process, which references the same buffer as above, then this exception will
> send to the downstream side to make it failure.
>
> This bug can be reproduced easily via running
> UnalignedCheckpointITCase#shouldPerformUnalignedCheckpointOnParallelRemoteChannel.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)