[ 
https://issues.apache.org/jira/browse/UIMA-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13613070#comment-13613070
 ] 

Lou DeGenaro commented on UIMA-2772:
------------------------------------

Update transport such that DuccProcess and DuccReservation carry Node field, 
and provide getter/setter and constructors employing same.

Update orchestrator to employ above newly added constructors.

Code delivered.
                
> DUCC resource manager - Restart and fast-start
> ----------------------------------------------
>
>                 Key: UIMA-2772
>                 URL: https://issues.apache.org/jira/browse/UIMA-2772
>             Project: UIMA
>          Issue Type: Bug
>          Components: DUCC
>            Reporter: Jim Challenger
>            Assignee: Jim Challenger
>
> Currently RM waits a "reasonable time" (init-stabiity) on startup to allow 
> nodes to check in, before accepting scheduling requests.  It is not possible 
> to know exactly how long to wait, making init-stability a heuristic.  For 
> normal startup this is not a problem.  If RM is restarting 'hot', or if the 
> orchestrator publishes non-preemptable jobs on restart, and the necessary 
> nodes have not arrived by the completion of init-stability wait, this can 
> cause many problems: over-commitment, under-commitment, and in some cases  
> inconsistent state (and crashes).
> To remedy this, RM will include the full Node object in its publications to 
> the OR, which will echo them back for work that it believes to be active. On 
> startup RM can fully reconstruct state as of its last publication from this, 
> eliminating the problem. A side-effect of this is that RM need not wait for 
> nodes to check in, significantly decreasing its startup time.  If nodes added 
> to the resource pool in this way never check in, the normal "dead node" 
> mechanism will kick in, maintaining consistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to