[
https://issues.apache.org/jira/browse/BOOKKEEPER-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13736973#comment-13736973
]
Ivan Kelly commented on BOOKKEEPER-668:
---------------------------------------
I added a test for this. I don't think the fix is right. Consider the following
case.
# Client calls connect() [state=CONNECTING]
# Client calls disconnect() before connect finishes [state=DISCONNECTED]
# Client calls connect() [state=CONNECTING]
# Connect completes, client writes request [state=CONNECTED]
# channelDisconnected() from previous disconnect() called
This will cause a request to be wrongly failed with the attached patch. This
isn't as bad as the current situation, but I wouldn't call it good. What would
be better would be to have requests owned by the channel which they have been
sent out on. This is a pretty big refactor though :/
> Race between PerChannelBookieClient#channelDisconnected() and disconnect()
> calls can make clients hang while add/reading entries in case of multiple
> bookie failures
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: BOOKKEEPER-668
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-668
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-client
> Affects Versions: 4.2.1, 4.3.0
> Reporter: Vinay
> Assignee: Sijie Guo
> Fix For: 4.2.2, 4.3.0
>
> Attachments: BOOKKEEPER-668.diff, BOOKKEEPER-668-test.diff
>
>
> 1. Ledger was created with ensemble 2 and quorum as 2 and entries were
> written.
> 2. While reading entries, 2 BKs out of 3 in cluster were killed and restarted.
> 3. Client was hung at read call waiting for sync counter notification.
> As though I was not able to reproduce this in some tries, but
> By looking at the logs and code, following seems to be problem
> 1. BookieWatcher got the notification first for changes in available bookies.
> 2. PerChannelBookieClient#disconnect() called from BookieWatcher for failed
> bookies. This has set the 'this.channel=null;'
> 3. PerChannelBookieClient#channelDisconnected() call came now, and it
> proceeded silently without notifying errors to read ops.
> So client is hung waiting for result.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira