[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13736973#comment-13736973
 ] 

Ivan Kelly commented on BOOKKEEPER-668:
---------------------------------------

I added a test for this. I don't think the fix is right. Consider the following 
case.

# Client calls connect() [state=CONNECTING]
# Client calls disconnect() before connect finishes [state=DISCONNECTED]
# Client calls connect() [state=CONNECTING]
# Connect completes, client writes request [state=CONNECTED]
# channelDisconnected() from previous disconnect() called

This will cause a request to be wrongly failed with the attached patch. This 
isn't as bad as the current situation, but I wouldn't call it good. What would 
be better would be to have requests owned by the channel which they have been 
sent out on. This is a pretty big refactor though :/
                
> Race between PerChannelBookieClient#channelDisconnected() and disconnect() 
> calls can make clients hang while add/reading entries in case of multiple 
> bookie failures
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-668
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-668
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-client
>    Affects Versions: 4.2.1, 4.3.0
>            Reporter: Vinay
>            Assignee: Sijie Guo
>             Fix For: 4.2.2, 4.3.0
>
>         Attachments: BOOKKEEPER-668.diff, BOOKKEEPER-668-test.diff
>
>
> 1. Ledger was created with ensemble 2 and quorum as 2 and entries were 
> written.
> 2. While reading entries, 2 BKs out of 3 in cluster were killed and restarted.
> 3. Client was hung at read call waiting for sync counter notification.
> As though I was not able to reproduce this in some tries, but
> By looking at the logs and code, following seems to be problem
> 1. BookieWatcher got the notification first for changes in available bookies.
> 2. PerChannelBookieClient#disconnect() called from BookieWatcher for failed 
> bookies. This has set the 'this.channel=null;'
> 3. PerChannelBookieClient#channelDisconnected() call came now, and it 
> proceeded silently without notifying errors to read ops.
> So client is hung waiting for result.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to