[
https://issues.apache.org/jira/browse/AVRO-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195894#comment-13195894
]
James Baldassari commented on AVRO-1013:
----------------------------------------
The second change I described to NettyServer#isConnected() actually broke a
bunch of unit tests, so I'm just going to leave that method unchanged.
> NettyTransceiver can hang after server restart
> ----------------------------------------------
>
> Key: AVRO-1013
> URL: https://issues.apache.org/jira/browse/AVRO-1013
> Project: Avro
> Issue Type: Bug
> Affects Versions: 1.6.1
> Reporter: James Baldassari
> Priority: Blocker
>
> I ran into a very specific scenario today which can lead to NettyTransceiver
> hanging indefinitely:
> # Start up a NettyServer
> # Initialize a NettyTransceiver and SpecificRequestor
> # Execute an RPC to establish the connection/handshake with the server
> # Shut down the server
> # Immediately execute another RPC
> After Step 4, NettyTransceiver will detect that the connection has been
> closed and call NettyTransceiver#disconnect(boolean, boolean, Throwable),
> which sets 'remote' to null, indicating to Requestor that the
> NettyTransceiver is now disconnected. However, if an RPC is executed just
> after the server has closed its socket (Step 5) and before disconnect() has
> been called, NettyTransceiver may still try to send this RPC because 'remote'
> has not yet been set to null. This race condition is normally ok because
> NettyTransceiver#getChannel() will detect that the socket has been closed and
> then try to reestablish the connection. Unfortunately, in this scenario
> getChannel() blocks forever when it attempts to acquire the write lock
> because the read lock has been acquired twice rather than once as
> getChannel() expects. The read lock is acquired once by
> transceive(List<ByteBuffer>, Callback<List<ByteBuffer>>) and again by
> writeDataPack(NettyDataPack).
> The fix is fairly simple. The writeDataPack(NettyDataPack) method (which is
> private) does not acquire the read lock but specifies in its contract that
> the read lock must acquired before calling this method. This change prevents
> the read lock from being acquired more than once by any single thread.
> Another change is to have NettyTransceiver#isConnected() perform two checks
> instead of one: remote != null && isChannelReady(channel). This second
> change should allow NettyTransceiver to detect disconnect events more quickly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira