Vladimir Prus created ZEPPELIN-5292:
---------------------------------------
Summary: Deadlock in ConnectionManager
Key: ZEPPELIN-5292
URL: https://issues.apache.org/jira/browse/ZEPPELIN-5292
Project: Zeppelin
Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Vladimir Prus
Attachments: stacktrace-2021-03-18.txt
Our 0.9.0 install fairly regularly becomes unresponsive. Specifically, if I
open the home page, I see the navigation bar, but nothing else shows up. The
problem does not resolve itself, and there's no CPU usage whatsoever.
I attach a stacktrace from one such incident, where about all threads are
waiting inside ConnectionManager, like so:
{code:java}
"qtp733672688-15179" #15179 prio=5 os_prio=0 tid=0x00007fc1f0002000 nid=0x14103
waiting for monitor entry [0x00007fc1d48c7000]
java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.zeppelin.socket.ConnectionManager.removeConnectionFromAllNote(ConnectionManager.java:175)
- waiting to lock <0x00007fc5dbb0c5d8> (a java.util.HashMap)
{code}
and
{code:java}
"qtp733672688-15068" #15068 prio=5 os_prio=0 tid=0x00007fc358001000 nid=0x14069
waiting for monitor entry [0x00007fc15aae9000]
java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.zeppelin.socket.ConnectionManager.addNoteConnection(ConnectionManager.java:108)
- waiting to lock <0x00007fc5dbb0c5d8> (a java.util.HashMap)
{code}
The lock is held here:
{code:java}
"qtp733672688-10896" #10896 prio=5 os_prio=0 tid=0x00007fc2f4007800 nid=0x12661
waiting for monitor entry [0x00007fc395267000]
java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.zeppelin.socket.NotebookSocket.send(NotebookSocket.java:70)
- waiting to lock <0x00007fc5dbe1b050> (a
org.apache.zeppelin.socket.NotebookSocket)
at
org.apache.zeppelin.socket.ConnectionManager.broadcast(ConnectionManager.java:247)
at
org.apache.zeppelin.socket.ConnectionManager.checkCollaborativeStatus(ConnectionManager.java:214)
at
org.apache.zeppelin.socket.ConnectionManager.removeConnectionFromNote(ConnectionManager.java:190)
- locked <0x00007fc5dbb0c5d8> (a java.util.HashMap)
at
org.apache.zeppelin.socket.ConnectionManager.removeConnectionFromAllNote(ConnectionManager.java:178)
- locked <0x00007fc5dbb0c5d8> (a java.util.HashMap)
{code}
Probably, NotebookSocket.send takes a long time, while holding a lock that is
blocking basically all connections?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)