[
https://issues.apache.org/jira/browse/FLINK-17823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhijiang updated FLINK-17823:
-----------------------------
Comment: was deleted
(was: Merged in master: 8c7c7267be95cddd7122d2b97e5334f5db4cc37c)
> Resolve the race condition while releasing RemoteInputChannel
> -------------------------------------------------------------
>
> Key: FLINK-17823
> URL: https://issues.apache.org/jira/browse/FLINK-17823
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Network
> Affects Versions: 1.11.0
> Reporter: Zhijiang
> Assignee: Zhijiang
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 1.11.0
>
>
> RemoteInputChannel#releaseAllResources might be called by canceler thread.
> Meanwhile, the task thread can also call RemoteInputChannel#getNextBuffer.
> There probably cause two potential problems:
> * Task thread might get null buffer after canceler thread already released
> all the buffers, then it might cause misleading NPE in getNextBuffer.
> * Task thread and canceler thread might pull the same buffer concurrently,
> which causes unexpected exception when the same buffer is recycled twice.
> The solution is to properly synchronize the buffer queue in release method to
> avoid the same buffer pulled by both canceler thread and task thread. And in
> getNextBuffer method, we add some explicit checks to avoid misleading NPE and
> hint some valid exceptions.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)