Jerry Cwiklik created UIMA-5567:
-----------------------------------
Summary: UIMA-DUCC: Agent should be able to recover its state
after restart
Key: UIMA-5567
URL: https://issues.apache.org/jira/browse/UIMA-5567
Project: UIMA
Issue Type: Improvement
Components: DUCC
Reporter: Jerry Cwiklik
Assignee: Jerry Cwiklik
Fix For: future-DUCC
Currently bouncing an agent is not possible. After launching a child process,
an agent adds an entry in its Process Inventory and uses a Process handle to
call waitFor() to detect child termination. When an agent restarts, it looses
all its children and has no means to recover its inventory.
The proposal is to change this behavior to allow agents to bounce and
subsequently recover their child processes. The bounce may be required to
update agent code for example.
An agent has two options to recover its child processes based on cgroup
availability.
If cgroups are enabled, an agent on startup will read all PIDs from cgroup.proc
file. These PIDs reflect running child processes on a node. An agent will
create a skeleton inventory entry for each PID and fill in the details when the
OR state is received. The agent will use a PID to find a matching process in
the OR state. After the new inventory is recovered, the timer based inventory
update will fetch PIDs from cgroup.proc file again and reconcile this with its
inventory. To detect child process termination an agent will compare PIDs in
inventory agains PIDs from cgroup.proc. If a PID is in inventory and not
present in cgroup.proc, an agent will mark such process as Stopped if
deallocate flag is true, or will mark it as Failed if deallocate flag is false.
Any AP process that is no longer running will be marked as Stopped.
If cgroups are not enabled, an agent will recover its inventory from the OR
state. While in this mode, an agent will disable its Rogue Process Detector and
not attempt to detect alien processes. The timer based inventory update will
fetch PIDs from the OS (using ps command) and reconcile this with its
inventory. To detect child process termination an agent will compare PIDs in
inventory against PIDs obtained from the OS. If a PID is in inventory and not
present in the OS, an agent will mark such process as Stopped if deallocate
flag is true, or will mark it as Failed if deallocate flag is false. Any AP
process that is no longer running will be marked as Stopped.
- An agent will no longer call waitFor() on a Process object returned from a
ProcessBuilder when a child process is launched
- An agent will continue to drain stdout and stderr of a child process to
prevent the child (duccling) from hanging and to receive OS errors which may
occur when exec'ing a process (bad cmd line, etc). After duccling calls
execve(), child process stdout and stderr are redirected to /dev/null and
nothing is expected from these streams by the agent.
- A child process will communicate state changes and initialization status to
an agent via a provided port. Question here is how the port is provided to a
child. Currently an agent uses -D (or env) to communicate its listener port to
a child. The port is determined when an agent starts up and can potentially be
different when an agent is bounced. So we either use a Registry to store
agent's port for a child to lookup or insist that an agent has a fixed port. If
an agent is bounced and such port is not available what should happen?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)