Jerry Cwiklik created UIMA-5310:
-----------------------------------

             Summary: UIMA-DUCC: Agent may hang in cleanup code on startup
                 Key: UIMA-5310
                 URL: https://issues.apache.org/jira/browse/UIMA-5310
             Project: UIMA
          Issue Type: Bug
          Components: DUCC
            Reporter: Jerry Cwiklik
            Assignee: Jerry Cwiklik
             Fix For: future-DUCC


When an agent starts up it checks if there are any cgroup containers left over 
from a previous agent. This may happen if for some reason an agent fails to 
stop a child process during a Ducc bounce for example. An agent tries to 
cleanup such processes with kill -9 on every process associated with a 
container. Once the kill is done, the code goes into a loop to verify that the 
process has been killed. It checks cgroup.procs to confirm that a process is 
gone. If a process is still in a container, an agent waits awhile and does a 
check again. Typically a process dies and cgroups accounting is done quickly. 
The agent removes a container and proceeds to run normally.
On rare occasions the ducc_ling fails to run kill -9 command and the process 
persists leading to a hang. 
An agent should not be blocking after the kill. If it finds a process still 
running it should report this fact and continue.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to