[jira] [Updated] (AVRO-1013) NettyTransceiver can hang after server restart

James Baldassari (Updated) (JIRA) Tue, 07 Feb 2012 19:43:44 -0800

     [ 
https://issues.apache.org/jira/browse/AVRO-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


James Baldassari updated AVRO-1013:
-----------------------------------

       Resolution: Fixed
    Fix Version/s: 1.6.2
           Status: Resolved  (was: Patch Available)

Committed
                
> NettyTransceiver can hang after server restart
> ----------------------------------------------
>
>                 Key: AVRO-1013
>                 URL: https://issues.apache.org/jira/browse/AVRO-1013
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.6.1
>            Reporter: James Baldassari
>            Assignee: James Baldassari
>            Priority: Blocker
>             Fix For: 1.6.2
>
>         Attachments: AVRO-1013.patch
>
>
> I ran into a very specific scenario today which can lead to NettyTransceiver 
> hanging indefinitely:
> # Start up a NettyServer
> # Initialize a NettyTransceiver and SpecificRequestor
> # Execute an RPC to establish the connection/handshake with the server
> # Shut down the server
> # Immediately execute another RPC
> After Step 4, NettyTransceiver will detect that the connection has been 
> closed and call NettyTransceiver#disconnect(boolean, boolean, Throwable), 
> which sets 'remote' to null, indicating to Requestor that the 
> NettyTransceiver is now disconnected.  However, if an RPC is executed just 
> after the server has closed its socket (Step 5) and before disconnect() has 
> been called, NettyTransceiver may still try to send this RPC because 'remote' 
> has not yet been set to null.  This race condition is normally ok because 
> NettyTransceiver#getChannel() will detect that the socket has been closed and 
> then try to reestablish the connection.  Unfortunately, in this scenario 
> getChannel() blocks forever when it attempts to acquire the write lock 
> because the read lock has been acquired twice rather than once as 
> getChannel() expects.  The read lock is acquired once by 
> transceive(List<ByteBuffer>, Callback<List<ByteBuffer>>) and again by 
> writeDataPack(NettyDataPack).
> The fix is fairly simple.  The writeDataPack(NettyDataPack) method (which is 
> private) does not acquire the read lock but specifies in its contract that 
> the read lock must acquired before calling this method.  This change prevents 
> the read lock from being acquired more than once by any single thread.  
> Another change is to have NettyTransceiver#isConnected() perform two checks 
> instead of one: remote != null && isChannelReady(channel).  This second 
> change should allow NettyTransceiver to detect disconnect events more quickly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (AVRO-1013) NettyTransceiver can hang after server restart

Reply via email to