[
https://issues.apache.org/jira/browse/ZOOKEEPER-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15904084#comment-15904084
]
ASF GitHub Bot commented on ZOOKEEPER-2711:
-------------------------------------------
Github user afine commented on a diff in the pull request:
https://github.com/apache/zookeeper/pull/186#discussion_r105300714
--- Diff:
src/java/main/org/apache/zookeeper/server/NettyServerCnxnFactory.java ---
@@ -162,7 +162,7 @@ public void messageReceived(ChannelHandlerContext ctx,
MessageEvent e)
+ " from " + ctx.getChannel());
}
NettyServerCnxn cnxn =
(NettyServerCnxn)ctx.getAttachment();
- synchronized(cnxn) {
+ synchronized(cnxn.getRpcLock()) {
processMessage(e, cnxn);
--- End diff --
>> Now one thread can be in processMessage while another thread is getting
stats about the connection. Is that ok?
> I believe this is OK. We can receive two concurrent stat commands, but we
only process one of them at a time. I'm also not a Netty wizard, so I could be
wildly wrong :)
This should be OK since all fields printed by `synchronized void
dumpConnectionInfo(PrintWriter pwriter, boolean brief)` are only updated in
`synchronized` methods.
Another option would be to put a shared lock on the command's execution
(shared with any other 4LW that acquires a lock on a connection other than its
own, I THINK `cons` is the only other one)? I think this may impact 4LW
performance slightly but it would prevent the need to add another lock to
`NettyServerCnxn`.
I think I like your way better.
> Deadlock between concurrent 4LW commands that iterate over connections with
> Netty server
> ----------------------------------------------------------------------------------------
>
> Key: ZOOKEEPER-2711
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2711
> Project: ZooKeeper
> Issue Type: Bug
> Reporter: Josh Elser
> Priority: Critical
>
> Observed the following issue in some $dayjob testing environments. Line
> numbers are a little off compared to master/branch-3.5, but I did confirm the
> same issue exists there.
> With the NettyServerCnxnFactory, before a request is dispatched, the code
> synchronizes on the {{NettyServerCnxn}} object. However, with some 4LW
> commands (like {{stat}}), each {{ServerCnxn}} object is also synchronized to
> (safely) iterate over the internal contents of the object to generate the
> necessary debug message. As such, multiple concurrent {{stat}} commands can
> both lock their own {{NettyServerCnxn}} objects, and then be blocked waiting
> to lock each others' {{ServerCnxn}} in the {{StatCommand}}, deadlocked.
> {noformat}
> "New I/O worker #55":
> at
> org.apache.zookeeper.server.ServerCnxn.dumpConnectionInfo(ServerCnxn.java:407)
> - waiting to lock <0x00000000fabc01b8> (a
> org.apache.zookeeper.server.NettyServerCnxn)
> at
> org.apache.zookeeper.server.NettyServerCnxn$StatCommand.commandRun(NettyServerCnxn.java:478)
> at
> org.apache.zookeeper.server.NettyServerCnxn$CommandThread.run(NettyServerCnxn.java:311)
> at
> org.apache.zookeeper.server.NettyServerCnxn$CommandThread.start(NettyServerCnxn.java:306)
> at
> org.apache.zookeeper.server.NettyServerCnxn.checkFourLetterWord(NettyServerCnxn.java:677)
> at
> org.apache.zookeeper.server.NettyServerCnxn.receiveMessage(NettyServerCnxn.java:790)
> at
> org.apache.zookeeper.server.NettyServerCnxnFactory$CnxnChannelHandler.processMessage(NettyServerCnxnFactory.java:211)
> at
> org.apache.zookeeper.server.NettyServerCnxnFactory$CnxnChannelHandler.messageReceived(NettyServerCnxnFactory.java:135)
> - locked <0x00000000fab68178> (a
> org.apache.zookeeper.server.NettyServerCnxn)
> at
> org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:88)
> at
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
> at
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
> at
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
> at
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
> at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
> at
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
> at
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
> at
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
> at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
> at
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
> at
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> "New I/O worker #51":
> at
> org.apache.zookeeper.server.ServerCnxn.dumpConnectionInfo(ServerCnxn.java:407)
> - waiting to lock <0x00000000fab68178> (a
> org.apache.zookeeper.server.NettyServerCnxn)
> at
> org.apache.zookeeper.server.NettyServerCnxn$StatCommand.commandRun(NettyServerCnxn.java:478)
> at
> org.apache.zookeeper.server.NettyServerCnxn$CommandThread.run(NettyServerCnxn.java:311)
> at
> org.apache.zookeeper.server.NettyServerCnxn$CommandThread.start(NettyServerCnxn.java:306)
> at
> org.apache.zookeeper.server.NettyServerCnxn.checkFourLetterWord(NettyServerCnxn.java:677)
> at
> org.apache.zookeeper.server.NettyServerCnxn.receiveMessage(NettyServerCnxn.java:790)
> at
> org.apache.zookeeper.server.NettyServerCnxnFactory$CnxnChannelHandler.processMessage(NettyServerCnxnFactory.java:211)
> at
> org.apache.zookeeper.server.NettyServerCnxnFactory$CnxnChannelHandler.messageReceived(NettyServerCnxnFactory.java:135)
> - locked <0x00000000fabc01b8> (a
> org.apache.zookeeper.server.NettyServerCnxn)
> at
> org.jboss.netty.channel.SimpleChannelHandler.handleUpstream(SimpleChannelHandler.java:88)
> at
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
> at
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
> at
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
> at
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
> at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
> at
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
> at
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
> at
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
> at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
> at
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
> at
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> It would appear that the synchronization on the {{NettyServerCnxn}} in
> {{NettyServerCnxnFactory}} is to blame (and I can see why it was done
> originally). I think we can just use a different Object (and monitor) to
> provide mutual exclusion at Netty layer (and avoid synchronization issues at
> the "application" layer).
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)