[jira] [Updated] (FLINK-18050) Fix the bug of recycling buffer twice once exception in ChannelStateWriteRequestDispatcher#dispatch

ASF GitHub Bot (Jira) Wed, 03 Jun 2020 00:42:45 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-18050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated FLINK-18050:
-----------------------------------
    Labels: pull-request-available  (was: )

> Fix the bug of recycling buffer twice once exception in 
> ChannelStateWriteRequestDispatcher#dispatch
> ---------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-18050
>                 URL: https://issues.apache.org/jira/browse/FLINK-18050
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.11.0
>            Reporter: Zhijiang
>            Assignee: Roman Khachatryan
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.11.0, 1.12.0
>
>
> When task finishes, the `CheckpointBarrierUnaligner` will decline the current 
> checkpoint, which would write abort request into `ChannelStateWriter`.
> The abort request will be executed before other write output request in the 
> queue, and close the underlying `CheckpointStateOutputStream`. Then when the 
> dispatcher executes the next write output request to access the stream, it 
> will throw ClosedByInterruptException to make dispatcher thread exit.
> In this process, the underlying buffers for current write output request will 
> be recycled twice. 
>  * ChannelStateCheckpointWriter#write will recycle all the buffers in finally 
> part, which can cover both exception and normal cases.
>  * ChannelStateWriteRequestDispatcherImpl#dispatch will call 
> `request.cancel(e)`  to recycle the underlying buffers again in the case of 
> exception.
> The effect of this bug can cause further exception in the network shuffle 
> process, which references the same buffer as above, then this exception will 
> send to the downstream side to make it failure.
>  
> This bug can be reproduced easily via running 
> UnalignedCheckpointITCase#shouldPerformUnalignedCheckpointOnParallelRemoteChannel.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-18050) Fix the bug of recycling buffer twice once exception in ChannelStateWriteRequestDispatcher#dispatch

Reply via email to