Vladimir Prus created ZEPPELIN-5292: ---------------------------------------
Summary: Deadlock in ConnectionManager Key: ZEPPELIN-5292 URL: https://issues.apache.org/jira/browse/ZEPPELIN-5292 Project: Zeppelin Issue Type: Bug Affects Versions: 0.9.0 Reporter: Vladimir Prus Attachments: stacktrace-2021-03-18.txt Our 0.9.0 install fairly regularly becomes unresponsive. Specifically, if I open the home page, I see the navigation bar, but nothing else shows up. The problem does not resolve itself, and there's no CPU usage whatsoever. I attach a stacktrace from one such incident, where about all threads are waiting inside ConnectionManager, like so: {code:java} "qtp733672688-15179" #15179 prio=5 os_prio=0 tid=0x00007fc1f0002000 nid=0x14103 waiting for monitor entry [0x00007fc1d48c7000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.zeppelin.socket.ConnectionManager.removeConnectionFromAllNote(ConnectionManager.java:175) - waiting to lock <0x00007fc5dbb0c5d8> (a java.util.HashMap) {code} and {code:java} "qtp733672688-15068" #15068 prio=5 os_prio=0 tid=0x00007fc358001000 nid=0x14069 waiting for monitor entry [0x00007fc15aae9000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.zeppelin.socket.ConnectionManager.addNoteConnection(ConnectionManager.java:108) - waiting to lock <0x00007fc5dbb0c5d8> (a java.util.HashMap) {code} The lock is held here: {code:java} "qtp733672688-10896" #10896 prio=5 os_prio=0 tid=0x00007fc2f4007800 nid=0x12661 waiting for monitor entry [0x00007fc395267000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.zeppelin.socket.NotebookSocket.send(NotebookSocket.java:70) - waiting to lock <0x00007fc5dbe1b050> (a org.apache.zeppelin.socket.NotebookSocket) at org.apache.zeppelin.socket.ConnectionManager.broadcast(ConnectionManager.java:247) at org.apache.zeppelin.socket.ConnectionManager.checkCollaborativeStatus(ConnectionManager.java:214) at org.apache.zeppelin.socket.ConnectionManager.removeConnectionFromNote(ConnectionManager.java:190) - locked <0x00007fc5dbb0c5d8> (a java.util.HashMap) at org.apache.zeppelin.socket.ConnectionManager.removeConnectionFromAllNote(ConnectionManager.java:178) - locked <0x00007fc5dbb0c5d8> (a java.util.HashMap) {code} Probably, NotebookSocket.send takes a long time, while holding a lock that is blocking basically all connections? -- This message was sent by Atlassian Jira (v8.3.4#803005)