Jeff Tsang created ZEPPELIN-4986:
------------------------------------
Summary: org.apache.zeppelin.server.ZeppelinServer thread won't be
released
Key: ZEPPELIN-4986
URL: https://issues.apache.org/jira/browse/ZEPPELIN-4986
Project: Zeppelin
Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Jeff Tsang
Attachments: image-2020-08-07-12-19-18-212.png
I created 50 notebooks with each contains 4 paragraphs, and have a batch job
calling API to async run all paragraphs for every 10 minutes. The zeppelin
runs with the docker images released at end of July (digest: 58568bd6f10e,
source commit: fe8fe9be7487791dc21094dd3cbef1d9190662cc)
One day the server is totally malfunctioning and the root cause is that there
are too many lived processes and exceeed the max limit of Linux PID. After
the server is recoverd, I monitor the process usage with "ps -eLfl" command,
and found everytime the batch job is triggered, Zeppelin will create 50+
threads to run paragraphs. These threads will turn into sleep state and still
occupy PID numbers even when the running jobs are done.
Here's part of the result of the ps command, and can see they all have same
parent PID but with different LWP (thread ID). And all threads run a java
application org.apache.zeppelin.server.ZeppelinServer.
!image-2020-08-07-12-19-18-212.png|width=1270,height=480!
Because these threads can be removed when the zeppelin is restarted, my current
workaround is to restart the zeppelin container periodically to prevent the PID
number exceed the max value. But still looking for a long-term solution to
solve this issue. Or is there any method to remove these sleeping threads?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)