Jim Challenger created UIMA-3389:
------------------------------------
Summary: DUCC RM bypasses node stability for some nodes at recovery
Key: UIMA-3389
URL: https://issues.apache.org/jira/browse/UIMA-3389
Project: UIMA
Issue Type: Bug
Components: DUCC
Affects Versions: 1.0-Ducc
Reporter: Jim Challenger
Assignee: Jim Challenger
Priority: Minor
Fix For: 1.0-Ducc
During RM hot-start or even warn-start, work can arrive that has been already
assigned to some node, where that node went down while RM was also down. The
design is that RM simply adds all nodes that have assigned work to it's list of
live nodes and lets normal NodeStability take it out if it's really not
responding. Unfortunately, RM was notifying the Scheduler object and not the
NodeStability object (which will then notify scheduler as well as set up the
node for monitoring), so unresponsive nodes never get removed. in this case.
--
This message was sent by Atlassian JIRA
(v6.1#6144)