Jim Challenger created UIMA-3389:
------------------------------------

             Summary: DUCC RM bypasses node stability for some nodes at recovery
                 Key: UIMA-3389
                 URL: https://issues.apache.org/jira/browse/UIMA-3389
             Project: UIMA
          Issue Type: Bug
          Components: DUCC
    Affects Versions: 1.0-Ducc
            Reporter: Jim Challenger
            Assignee: Jim Challenger
            Priority: Minor
             Fix For: 1.0-Ducc


During RM hot-start or even warn-start, work can arrive that has been already 
assigned to some node, where that node went down while RM was also down.  The 
design is that RM simply adds all nodes that have assigned work to it's list of 
live nodes and lets normal NodeStability take it out if it's really not 
responding.  Unfortunately, RM was notifying the Scheduler object and not the 
NodeStability object (which will then notify scheduler as well as set up the 
node for monitoring), so unresponsive nodes never get removed. in this case.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to