[jira] [Comment Edited] (FLINK-18050) Fix the bug of recycling buffer twice once exception in ChannelStateWriteRequestDispatcher#dispatch

Zhijiang (Jira) Tue, 09 Jun 2020 00:37:17 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-18050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17128958#comment-17128958
 ]


Zhijiang edited comment on FLINK-18050 at 6/9/20, 7:36 AM:
-----------------------------------------------------------

Merged in master: 

ed7b0b1bea84a10ee45d10343f239cd183659a74, 

f2dd4b8500a82532dae17087c227ce34e1aeac9b

Merged in release-1.11: 

a233c0ff82273ca59bb1decdb1ffb6020d27ccfd,  

822e01b613b0b6821383f3cd5b0357054242b6a9


was (Author: zjwang):
Merged in master: ed7b0b1bea84a10ee45d10343f239cd183659a74, 

f2dd4b8500a82532dae17087c227ce34e1aeac9b

Merged in release-1.11: a233c0ff82273ca59bb1decdb1ffb6020d27ccfd,  
822e01b613b0b6821383f3cd5b0357054242b6a9

> Fix the bug of recycling buffer twice once exception in 
> ChannelStateWriteRequestDispatcher#dispatch
> ---------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-18050
>                 URL: https://issues.apache.org/jira/browse/FLINK-18050
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.11.0
>            Reporter: Zhijiang
>            Assignee: Roman Khachatryan
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.11.0, 1.12.0
>
>
> When task finishes, the `CheckpointBarrierUnaligner` will decline the current 
> checkpoint, which would write abort request into `ChannelStateWriter`.
> The abort request will be executed before other write output request in the 
> queue, and close the underlying `CheckpointStateOutputStream`. Then when the 
> dispatcher executes the next write output request to access the stream, it 
> will throw ClosedByInterruptException to make dispatcher thread exit.
> In this process, the underlying buffers for current write output request will 
> be recycled twice. 
>  * ChannelStateCheckpointWriter#write will recycle all the buffers in finally 
> part, which can cover both exception and normal cases.
>  * ChannelStateWriteRequestDispatcherImpl#dispatch will call 
> `request.cancel(e)`  to recycle the underlying buffers again in the case of 
> exception.
> The effect of this bug can cause further exception in the network shuffle 
> process, which references the same buffer as above, then this exception will 
> send to the downstream side to make it failure.
>  
> This bug can be reproduced easily via running 
> UnalignedCheckpointITCase#shouldPerformUnalignedCheckpointOnParallelRemoteChannel.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-18050) Fix the bug of recycling buffer twice once exception in ChannelStateWriteRequestDispatcher#dispatch

Reply via email to