[
https://issues.apache.org/jira/browse/UIMA-5794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jerry Cwiklik updated UIMA-5794:
--------------------------------
Description:
Agent does not stop running processes sometimes. In a specific case, the agent
left a few processes running even though these processes state was set to
Stopping.
[Process Type=Pop DUCC ID=348 PID=17099 State=Stopping Resident
Memory=361656320 GC Total=-1 GC Time=-1 Init Stats List Size:0 Reason:
JPHasNoActiveJob] Exit Code=0
[Process Type=Pop DUCC ID=364 PID=593 State=Stopping Resident
Memory=7382974464 GC Total=-1 GC Time=-1 Init Stats List Size:0 Reason:
JPHasNoActiveJob] Exit Code=0
For some reason Agent failed to send SIGKILL after SIGTERM failed to stop them.
Since these processes used a lot of memory, the OS killer ended up killing
legit processes to keep the node from running out of memory.
Since agent logs wrapped the evidence of what happened has been lost.
Modify agent to keep sending SIGKILL to processes in Stopping state after some
time lapses. Perhaps rogue process detector can be tasked with that.
was:
Agent does not stop running processes sometimes. In a specific case, the agent
left a few processes running even though these processes state were set to
Stopping.
[Process Type=Pop DUCC ID=348 PID=17099 State=Stopping Resident
Memory=361656320 GC Total=-1 GC Time=-1 Init Stats List Size:0 Reason:
JPHasNoActiveJob] Exit Code=0
[Process Type=Pop DUCC ID=364 PID=593 State=Stopping Resident
Memory=7382974464 GC Total=-1 GC Time=-1 Init Stats List Size:0 Reason:
JPHasNoActiveJob] Exit Code=0
For some reason Agent failed to send SIGKILL after SIGTERM failed to stop them.
Since these processes used a lot of memory, the OS killer ended up killing
legit processes to keep the node from running out of memory.
Since agent logs wrapped the evidence of what happened has been lost.
Modify agent to keep sending SIGKILL to processes in Stopping state after some
time lapses. Perhaps rogue process detector can be tasked with that.
> DUCC: Agent fails to stop processes
> -----------------------------------
>
> Key: UIMA-5794
> URL: https://issues.apache.org/jira/browse/UIMA-5794
> Project: UIMA
> Issue Type: Bug
> Components: DUCC
> Reporter: Jerry Cwiklik
> Assignee: Jerry Cwiklik
> Priority: Major
> Fix For: 2.2.3-Ducc
>
>
> Agent does not stop running processes sometimes. In a specific case, the
> agent left a few processes running even though these processes state was set
> to Stopping.
> [Process Type=Pop DUCC ID=348 PID=17099 State=Stopping Resident
> Memory=361656320 GC Total=-1 GC Time=-1 Init Stats List Size:0 Reason:
> JPHasNoActiveJob] Exit Code=0
> [Process Type=Pop DUCC ID=364 PID=593 State=Stopping Resident
> Memory=7382974464 GC Total=-1 GC Time=-1 Init Stats List Size:0 Reason:
> JPHasNoActiveJob] Exit Code=0
> For some reason Agent failed to send SIGKILL after SIGTERM failed to stop
> them. Since these processes used a lot of memory, the OS killer ended up
> killing legit processes to keep the node from running out of memory.
> Since agent logs wrapped the evidence of what happened has been lost.
> Modify agent to keep sending SIGKILL to processes in Stopping state after
> some time lapses. Perhaps rogue process detector can be tasked with that.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)