[
https://issues.apache.org/jira/browse/FLINK-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341655#comment-15341655
]
ASF GitHub Bot commented on FLINK-4021:
---------------------------------------
Github user uce commented on the issue:
https://github.com/apache/flink/pull/2141
Thank you for this PR. I will try to look into it next week. I think we
should wait for the 1.1 release before we merge this though.
> Problem of setting autoread for netty channel when more tasks sharing the
> same Tcp connection
> ---------------------------------------------------------------------------------------------
>
> Key: FLINK-4021
> URL: https://issues.apache.org/jira/browse/FLINK-4021
> Project: Flink
> Issue Type: Bug
> Components: Distributed Coordination
> Affects Versions: 1.0.2
> Reporter: Zhijiang Wang
> Assignee: Zhijiang Wang
>
> More than one task sharing the same Tcp connection for shuffling data.
> If the downstream task said as "A" has no available memory segment to read
> netty buffer from network, it will set autoread as false for the channel.
> When the task A is failed or has available segments again, the netty handler
> will be notified to process the staging buffers first, then reset autoread as
> true. But in some scenarios, the autoread will not be set as true any more.
> That is when processing staging buffers, first find the corresponding input
> channel for the buffer, if the task for that input channel is failed, the
> decodeMsg method in PartitionRequestClientHandler will return false, that
> means setting autoread as true will not be done anymore.
> In summary, if one task "A" sets the autoread as false because of no
> available segments, and resulting in some staging buffers. If another task
> "B" is failed by accident corresponding to one staging buffer. When task A
> trys to reset autoread as true, the process can not work because of task B
> failed.
> I have fixed this problem in our application by adding one boolean parameter
> in decodeBufferOrEvent method to distinguish whether this method is invoke by
> netty IO thread channel read or staged message handler task in
> PartitionRequestClientHandler.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)