[
https://issues.apache.org/jira/browse/BOOKKEEPER-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Flavio Junqueira reassigned BOOKKEEPER-56:
------------------------------------------
Assignee: Gavin Li
> Race condition of message handler in connection recovery in Hedwig client
> -------------------------------------------------------------------------
>
> Key: BOOKKEEPER-56
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-56
> Project: Bookkeeper
> Issue Type: Bug
> Components: hedwig-client
> Affects Versions: 3.4.0
> Reporter: Gavin Li
> Assignee: Gavin Li
> Fix For: 3.4.0
>
> Attachments: patch_56
>
>
> There's a race condition in the connection recovery logic in Hedwig client.
> The message handler user set might be overwritten incorrectly.
> When handling channelDisconnected event, we try to reconnect to Hedwig
> server. After the connection is created and subscribed, we'll call
> StartDelivery() to recover the message handler to the original one of the
> disconnected connection. But if during this process, user calls
> StartDelivery() to set a new message handler, it will get overwritten to the
> original one.
> The process can be demonstrated as below:
> main thread__________________________________netty worker thread
> __________________________________________________________________________________________________
> StartDelivery(messageHandlerA)
> (connection Broken here, and recovered later...)
> ____________________________________________ResponseHandler::channelDisconnected()
> (connection disconnected event received)
> ____________________________________________new
> SubscribeReconnectCallback(subHandler.getMessageHandler()) (store
> messageHandlerA in SubscribeReconnectCallback to recover later)
> ____________________________________________client.doConnect() (try reconnect)
> ____________________________________________doSubUnsub() (resubscribe)
> ____________________________________________SubscriberResponseHandler::handleSubscribeResponse()
> (subscription succeeds)
> StartDelivery(messageHandlderB)
> ____________________________________________SubscribeReconnectCallback::operationFinished()
> ____________________________________________StartDelvery(messageHandlerA)
> (messageHandler get overwritten)
> I can stably reproduce this by simulating this race condition by put some
> sleep in ResponseHandler.
> I think essentially speaking we should not store messageHandler in
> ResponseHandler, since the message handler is supposed to be bound to
> connection. Instead, no matter which connection is in use, we should use the
> same messageHandler, the one user set last time. So I think we should change
> to store messageHandler in the HedwigSubscriber, in this way we don't need to
> recover the handler in connection recovery and thus won't face this race
> condition.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira