Jerry Cwiklik created UIMA-3685:
-----------------------------------
Summary: DUCC's rogue process detector not reporting JPs parented
by init (1)
Key: UIMA-3685
URL: https://issues.apache.org/jira/browse/UIMA-3685
Project: UIMA
Issue Type: Bug
Components: DUCC
Affects Versions: 1.0-Ducc
Reporter: Jerry Cwiklik
Assignee: Jerry Cwiklik
Its been observed that a JP launched by DUCC hung while writing out its core
dump due to exceeded quota. The process was still alive blocking in write().
The core dump caused the change in process ownership. The OS changed the owner
from <user> to init(1). The process still had its cgroup intact as it was still
running.
The rogue process detector while looking for rogue processes checks if a
process belongs to a cgroup. If it does, the detector assumes that this is a
valid process and not rogue.
The detector should not check if the process belongs to a cgroup while
determining if its rogue or not. Any process that does not have ducc as its
ancestor should be treated as rogue and reported as such for subsequent
cleanup. Exception to this are processes belonging to users with reservations
on the node.
--
This message was sent by Atlassian JIRA
(v6.2#6252)