Jerry Cwiklik created UIMA-5528:
-----------------------------------
Summary: UIMA-DUCC: improve agent monitoring of cgroups
Key: UIMA-5528
URL: https://issues.apache.org/jira/browse/UIMA-5528
Project: UIMA
Issue Type: Improvement
Components: DUCC
Reporter: Jerry Cwiklik
Assignee: Jerry Cwiklik
Fix For: future-DUCC
Currently agent performs node cgroup validation at startup only. In older
versions of RedHat it has been observed that cgroup memory subsystem disappears
due to the OS bug. Subsequently all jobs fail due to cgroup creation failure.
Modify agent monitoring of a node by trying to test cgroup creation at regular
intervals. This check should be part of the node metrics collection. If the
cgroup creation fails, the agent should mark the state of cgroups as 'Broken'.
This new state will be displayed by duccmon.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)