[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093662#comment-13093662
 ] 

Ivan Kelly commented on BOOKKEEPER-56:
--------------------------------------

It seems inelegant to have to look up the delivery handler every time, when the 
message has already arrived in an object which can know how to deliver it. 
Perhaps we could add a package private method on HedwigSubscriber, called 
restartDelivery, which gets the handler from the hashmap and sets it in the 
response handler. In this case, the patch wouldn't modify the response handler 
at all, just how the reconnect callback sets it.

The correct behaviour in this case is that the reconnect callback should not be 
able to overwrite the message handler. I think it is also valid to broaden this 
to say that noone should ever be able to overwrite the message handler, as this 
would indicate that startDelivery had been called twice without stopDelivery 
being called in between, which would indicate a programming error on the part 
of the client. 

There are tabs in the patch. For BK/HW the standard is 4 space indentation.

> Race condition of message handler in connection recovery in Hedwig client
> -------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-56
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-56
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: hedwig-client
>    Affects Versions: 3.4.0
>            Reporter: Gavin Li
>             Fix For: 3.4.0
>
>         Attachments: patch_56
>
>
> There's a race condition in the connection recovery logic in Hedwig client. 
> The message handler user set might be overwritten incorrectly. 
> When handling channelDisconnected event, we try to reconnect to Hedwig 
> server. After the connection is created and subscribed, we'll call 
> StartDelivery() to recover the message handler to the original one of the 
> disconnected connection. But if during this process, user calls 
> StartDelivery() to set a new message handler, it will get overwritten to the 
> original one.
> The process can be demonstrated as below:
> main thread__________________________________netty worker thread
> __________________________________________________________________________________________________
> StartDelivery(messageHandlerA)
> (connection Broken here, and recovered later...)
> ____________________________________________ResponseHandler::channelDisconnected()
>    (connection disconnected event received)
> ____________________________________________new 
> SubscribeReconnectCallback(subHandler.getMessageHandler()) (store 
> messageHandlerA in SubscribeReconnectCallback to recover later)
> ____________________________________________client.doConnect() (try reconnect)
> ____________________________________________doSubUnsub() (resubscribe)
> ____________________________________________SubscriberResponseHandler::handleSubscribeResponse()
>   (subscription succeeds)
> StartDelivery(messageHandlderB)
> ____________________________________________SubscribeReconnectCallback::operationFinished()
> ____________________________________________StartDelvery(messageHandlerA)   
> (messageHandler get overwritten)
> I can stably reproduce this by simulating this race condition by put some 
> sleep in ResponseHandler.
> I think essentially speaking we should not store messageHandler in 
> ResponseHandler, since the message handler is supposed to be bound to 
> connection. Instead, no matter which connection is in use, we should use the 
> same messageHandler, the one user set last time. So I think we should change 
> to store messageHandler in the HedwigSubscriber, in this way we don't need to 
> recover the handler in connection recovery and thus won't face this race 
> condition.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to