[ 
https://issues.apache.org/jira/browse/SLIDER-594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193984#comment-14193984
 ] 

Steve Loughran commented on SLIDER-594:
---------------------------------------

bq. add delay to all container starts through a configurable delay value. We 
can create a patch and see if thats reasonable.

If the {{RoleLaunchService}} was changed so that all launches went through a 
{{DelayQueue}}, (see {{QueueService}}), this would be transparent.

Alternatively: If you look at the AM method {{onContainersCompleted}} which 
gets the events from the RM, it triggers a review at the end, via 

{{    reviewRequestAndReleaseNodes("onContainersCompleted")}}

All that does is add to the back of the "immediate" queue a review action:
{{    queue(new ReviewAndFlexApplicationSize(reason, 0, TimeUnit.SECONDS));}}

We could 
# set a delay and then {{schedule()}} the review. This is harmless if its a 
flex down, all it does is postpone looking to see if other changes are needed.
# maybe (though this is getting clever) decide whether to queue vs schedule 
based on container exit codes. I'd be against this just due to the complexity.
# maybe rework the logic in the AM's  {{handleReviewAndFlexApplicationSize()}}  
event handler; this delays any size reviews until all pending queued size 
changing events are completed...this'd need to be extended to also delay if 
there is one scheduled. That way, no matter what events get issued, the review 
doesn't kick in until the end of the delay.

This is more complex, isn't it? Let's go with the launch delay. There is some 
timeout after which YARN assumes there's a problem and takes away the 
container; some value in minutes, I suspect

We'd need to make this delay programmable, and keep it low for the funtests, 
otherwise they will start timing out.

> Add a sleep before container restart as ports may not be released from the 
> last activation
> ------------------------------------------------------------------------------------------
>
>                 Key: SLIDER-594
>                 URL: https://issues.apache.org/jira/browse/SLIDER-594
>             Project: Slider
>          Issue Type: Bug
>          Components: agent-provider
>    Affects Versions: Slider 0.50
>            Reporter: Sumit Mohanty
>            Assignee: Jonathan Maron
>            Priority: Critical
>             Fix For: Slider 0.60
>
>
> This is critical for applications that do not use dynamic port and 
> applications using labels do not use dynamic ports. A configurable delay 
> should be added to allow for scenarios where component instances get killed 
> rather than stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to