Jeff Tsang created ZEPPELIN-4986:
------------------------------------

             Summary: org.apache.zeppelin.server.ZeppelinServer thread won't be 
released
                 Key: ZEPPELIN-4986
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-4986
             Project: Zeppelin
          Issue Type: Bug
    Affects Versions: 0.9.0
            Reporter: Jeff Tsang
         Attachments: image-2020-08-07-12-19-18-212.png

I created 50 notebooks with each contains 4 paragraphs, and have a batch job 
calling API to async run all paragraphs for every 10 minutes.  The zeppelin 
runs with the docker images released at end of July (digest: 58568bd6f10e, 
source commit: fe8fe9be7487791dc21094dd3cbef1d9190662cc)

 

One day the server is totally malfunctioning and the root cause is that there 
are too many lived processes and exceeed the max limit of Linux PID.   After 
the server is recoverd, I monitor the process usage with "ps -eLfl" command, 
and found everytime the batch job is triggered, Zeppelin will create 50+ 
threads to run paragraphs.   These threads will turn into sleep state and still 
occupy PID numbers even when the running jobs are done.

Here's part of the result of the ps command, and can see they all have same 
parent PID but with different LWP (thread ID).   And all threads run a java 
application org.apache.zeppelin.server.ZeppelinServer. 
!image-2020-08-07-12-19-18-212.png|width=1270,height=480!

Because these threads can be removed when the zeppelin is restarted, my current 
workaround is to restart the zeppelin container periodically to prevent the PID 
number exceed the max value.  But still looking for a long-term solution to 
solve this issue.   Or is there any method to remove these sleeping threads?

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to