[jira] [Created] (UIMA-5567) UIMA-DUCC: Agent should be able to recover its state after restart

Jerry Cwiklik (JIRA) Wed, 13 Sep 2017 11:40:26 -0700

Jerry Cwiklik created UIMA-5567:
-----------------------------------

             Summary: UIMA-DUCC: Agent should be able to recover its state 
after restart
                 Key: UIMA-5567
                 URL: https://issues.apache.org/jira/browse/UIMA-5567
             Project: UIMA
          Issue Type: Improvement
          Components: DUCC
            Reporter: Jerry Cwiklik
            Assignee: Jerry Cwiklik
             Fix For: future-DUCC



Currently bouncing an agent is not possible. After launching a child process, 
an agent adds an entry in its Process Inventory and uses a Process handle to 
call waitFor() to detect child termination. When an agent restarts, it looses 
all its children and has no means to recover its inventory.

The proposal is to change this behavior to allow agents to bounce and 
subsequently recover their child processes.  The bounce may be required to 
update agent code for example.

An agent has two options to recover its child processes based on cgroup 
availability.

If cgroups are enabled, an agent on startup will read all PIDs from cgroup.proc 
file. These PIDs reflect running child processes on a node. An agent will 
create a skeleton inventory entry for each PID and fill in the details when the 
OR state is received. The agent will use a PID to find a matching process in 
the OR state. After the new inventory is recovered, the timer based inventory 
update will fetch PIDs from cgroup.proc file again and reconcile this with its 
inventory. To detect child process termination an agent will compare PIDs in 
inventory agains PIDs from cgroup.proc. If a PID is in inventory and not 
present in cgroup.proc, an agent will mark such process as Stopped if 
deallocate flag is true, or will mark it as Failed if deallocate flag is false. 
Any AP process that is no longer running will be marked as Stopped.

If cgroups are not enabled, an agent will recover its inventory from the OR 
state. While in this mode, an agent will disable its Rogue Process Detector and 
not attempt to detect alien processes. The timer based inventory update will 
fetch PIDs from the OS (using ps command) and reconcile this with its 
inventory. To detect child process termination an agent will compare PIDs in 
inventory against PIDs obtained from the OS. If a PID is in inventory and not 
present in the OS, an agent will mark such process as Stopped if deallocate 
flag is true, or will mark it as Failed if deallocate flag is false. Any AP 
process that is no longer running will be marked as Stopped.


- An agent will no longer call waitFor() on a Process object returned from a 
ProcessBuilder when a child process is launched

- An agent will continue to drain stdout and stderr of a child process to 
prevent the child (duccling) from hanging and to receive OS errors which may 
occur when exec'ing a process (bad cmd line, etc).  After duccling calls 
execve(), child process stdout and stderr are redirected to /dev/null and 
nothing is expected from these streams by the agent. 

- A child process will communicate state changes and initialization status to 
an agent via a provided port. Question here is how the port is provided to a 
child. Currently an agent uses -D (or env) to communicate its listener port to 
a child. The port is determined when an agent starts up and can potentially be 
different when an agent is bounced. So we either use a Registry to store 
agent's port for a child to lookup or insist that an agent has a fixed port. If 
an agent is bounced and such port is not available what should happen?




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (UIMA-5567) UIMA-DUCC: Agent should be able to recover its state after restart

Reply via email to