[
https://issues.apache.org/jira/browse/BOOKKEEPER-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13737099#comment-13737099
]
Sijie Guo commented on BOOKKEEPER-668:
--------------------------------------
{quote}
This will cause a request to be wrongly failed with the attached patch. This
isn't as bad as the current situation, but I wouldn't call it good.
{quote}
yes as my previous comment. it would be an eventual behavior that the requests
would be failed and retried and finally sent to right bookie, as it is
difficult to when channelDisconnect would be called. for this fix, I would
expect a simple fix for both 4.2.2 and 4.3.0 rather than refactoring.
> Race between PerChannelBookieClient#channelDisconnected() and disconnect()
> calls can make clients hang while add/reading entries in case of multiple
> bookie failures
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: BOOKKEEPER-668
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-668
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-client
> Affects Versions: 4.2.1, 4.3.0
> Reporter: Vinay
> Assignee: Sijie Guo
> Fix For: 4.2.2, 4.3.0
>
> Attachments: BOOKKEEPER-668.diff, BOOKKEEPER-668-test.diff
>
>
> 1. Ledger was created with ensemble 2 and quorum as 2 and entries were
> written.
> 2. While reading entries, 2 BKs out of 3 in cluster were killed and restarted.
> 3. Client was hung at read call waiting for sync counter notification.
> As though I was not able to reproduce this in some tries, but
> By looking at the logs and code, following seems to be problem
> 1. BookieWatcher got the notification first for changes in available bookies.
> 2. PerChannelBookieClient#disconnect() called from BookieWatcher for failed
> bookies. This has set the 'this.channel=null;'
> 3. PerChannelBookieClient#channelDisconnected() call came now, and it
> proceeded silently without notifying errors to read ops.
> So client is hung waiting for result.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira