[
https://issues.apache.org/jira/browse/UIMA-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jerry Cwiklik updated UIMA-5310:
--------------------------------
Description:
When an agent starts up it checks if there are any cgroup containers left over
from a previous agent. This may happen if for some reason an agent fails to
stop a child process during a Ducc bounce for example. An agent tries to
cleanup such processes with kill -9. Once the kill is done, the code goes into
a loop checking cgroup.procs to confirm that a process is gone. If a process is
still in a container, an agent waits awhile and does a check again. Typically a
process dies and cgroups accounting is done quickly. The agent removes a
container and proceeds to run normally.
On rare occasions ducc_ling fails to run kill -9 command and the process
persists leading to a hang.
An agent should not be blocking after the kill. If it finds a process still
running it should report this fact and continue.
was:
When an agent starts up it checks if there are any cgroup containers left over
from a previous agent. This may happen if for some reason an agent fails to
stop a child process during a Ducc bounce for example. An agent tries to
cleanup such processes with kill -9 on every process associated with a
container. Once the kill is done, the code goes into a loop to verify that the
process has been killed. It checks cgroup.procs to confirm that a process is
gone. If a process is still in a container, an agent waits awhile and does a
check again. Typically a process dies and cgroups accounting is done quickly.
The agent removes a container and proceeds to run normally.
On rare occasions the ducc_ling fails to run kill -9 command and the process
persists leading to a hang.
An agent should not be blocking after the kill. If it finds a process still
running it should report this fact and continue.
> UIMA-DUCC: Agent may hang in cleanup code on startup
> ----------------------------------------------------
>
> Key: UIMA-5310
> URL: https://issues.apache.org/jira/browse/UIMA-5310
> Project: UIMA
> Issue Type: Bug
> Components: DUCC
> Reporter: Jerry Cwiklik
> Assignee: Jerry Cwiklik
> Fix For: future-DUCC
>
>
> When an agent starts up it checks if there are any cgroup containers left
> over from a previous agent. This may happen if for some reason an agent fails
> to stop a child process during a Ducc bounce for example. An agent tries to
> cleanup such processes with kill -9. Once the kill is done, the code goes
> into a loop checking cgroup.procs to confirm that a process is gone. If a
> process is still in a container, an agent waits awhile and does a check
> again. Typically a process dies and cgroups accounting is done quickly. The
> agent removes a container and proceeds to run normally.
> On rare occasions ducc_ling fails to run kill -9 command and the process
> persists leading to a hang.
> An agent should not be blocking after the kill. If it finds a process still
> running it should report this fact and continue.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)