[ https://issues.apache.org/jira/browse/AVRO-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doug Cutting resolved AVRO-1293. -------------------------------- Resolution: Fixed Fix Version/s: 1.7.5 I committed this. Thanks, James! > NettyTransceiver: Deadlock can occur when different threads call getChannel() > and close() concurrently > ------------------------------------------------------------------------------------------------------ > > Key: AVRO-1293 > URL: https://issues.apache.org/jira/browse/AVRO-1293 > Project: Avro > Issue Type: Bug > Components: java > Affects Versions: 1.7.4 > Reporter: James Baldassari > Assignee: James Baldassari > Fix For: 1.7.5 > > Attachments: AVRO-1293.patch > > > While testing patches for AVRO-1292 I stumbled upon a deadlock in > NettyTransceiver that I've never seen before. It happened when close() was > called at roughly the same time that another thread was trying to invoke an > RPC. Here are the stack traces for the two threads that were involved in the > deadlock: > {code} > "Thread 1: Writer": > at > org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:267) > - waiting to lock <0x000000067b1a7bc8> (a java.lang.Object) > at > org.apache.avro.ipc.NettyTransceiver.getRemoteName(NettyTransceiver.java:391) > at org.apache.avro.ipc.Requestor.writeHandshake(Requestor.java:202) > at org.apache.avro.ipc.Requestor.access$3(Requestor.java:198) > at org.apache.avro.ipc.Requestor$Request.getBytes(Requestor.java:478) > at org.apache.avro.ipc.Requestor.request(Requestor.java:147) > at org.apache.avro.ipc.Requestor.request(Requestor.java:101) > at > org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:88) > {code} > {code} > "Thread 2: Closer": > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x000000067aedea90> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:807) > at > org.apache.avro.ipc.NettyTransceiver.disconnect(NettyTransceiver.java:307) > at > org.apache.avro.ipc.NettyTransceiver.access$2(NettyTransceiver.java:293) > at > org.apache.avro.ipc.NettyTransceiver$NettyClientAvroHandler.handleUpstream(NettyTransceiver.java:542) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:783) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:348) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:232) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:98) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) > at > org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:404) > at > org.jboss.netty.channel.socket.nio.NioWorker.close(NioWorker.java:602) > at > org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:101) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:771) > at > org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:60) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582) > at org.jboss.netty.channel.Channels.close(Channels.java:720) > at > org.jboss.netty.channel.AbstractChannel.close(AbstractChannel.java:200) > at > org.jboss.netty.channel.ChannelFutureListener$2.operationComplete(ChannelFutureListener.java:57) > at > org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:381) > at > org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:367) > at > org.jboss.netty.channel.DefaultChannelFuture.cancel(DefaultChannelFuture.java:356) > at > org.apache.avro.ipc.NettyTransceiver.disconnect(NettyTransceiver.java:301) > - locked <0x000000067b1a7bc8> (a java.lang.Object) > at > org.apache.avro.ipc.NettyTransceiver.close(NettyTransceiver.java:380) > at java.lang.Thread.run(Thread.java:662) > {code} > Both of these methods acquire two different locks, the {{stateLock}} write > lock and the monitor of {{channelFutureLock}}. The problem is that, under > certain circumstances, these methods will acquire the locks in different > orders which results in the deadlock. The sequence of events is something > like this: > 1. Thread 2 calls {{close()}} -> {{disconnect(true, true, null)}} > 2. Inside a {{synchronized(channelFutureLock)}} block the {{disconnect}} > method calls {{channelFuture.cancel()}}. Normally this would trigger an > asynchronous event which would fire in a separate thread, but in this case > Netty fires the event in the same thread, and > {{NettyClientAvroHandler#handleUpstream(...)}} is invoked. > 3. Thread 1 calls {{getChannel()}} and obtains the write lock on > {{stateLock}}. It then tries to synchronize on {{channelFutureLock}} but > blocks because Thread 2 has already locked its monitor. > 4. Thread 2 calls the {{disconnect}} method from the {{handleUpstream}} > method but blocks while attempting to acquire the {{stateLock}} write lock > because Thread 1 has already locked it. > There are a couple of fairly simple solutions to this problem. The first is > that the {{disconnect}} method should call {{channelFuture.cancel()}} > _outside_ of the {{synchronized(channelFutureLock)}} block. Another solution > would be to use a ExecutorService to guarantee that > {{channelFuture.cancel()}} is always called in a separate thread. I think I > prefer the first solution because it's simpler and does not require > introducing a thread pool. I'll work on a patch for that solution. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira