[
https://issues.apache.org/jira/browse/SLIDER-594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193984#comment-14193984
]
Steve Loughran commented on SLIDER-594:
---------------------------------------
bq. add delay to all container starts through a configurable delay value. We
can create a patch and see if thats reasonable.
If the {{RoleLaunchService}} was changed so that all launches went through a
{{DelayQueue}}, (see {{QueueService}}), this would be transparent.
Alternatively: If you look at the AM method {{onContainersCompleted}} which
gets the events from the RM, it triggers a review at the end, via
{{ reviewRequestAndReleaseNodes("onContainersCompleted")}}
All that does is add to the back of the "immediate" queue a review action:
{{ queue(new ReviewAndFlexApplicationSize(reason, 0, TimeUnit.SECONDS));}}
We could
# set a delay and then {{schedule()}} the review. This is harmless if its a
flex down, all it does is postpone looking to see if other changes are needed.
# maybe (though this is getting clever) decide whether to queue vs schedule
based on container exit codes. I'd be against this just due to the complexity.
# maybe rework the logic in the AM's {{handleReviewAndFlexApplicationSize()}}
event handler; this delays any size reviews until all pending queued size
changing events are completed...this'd need to be extended to also delay if
there is one scheduled. That way, no matter what events get issued, the review
doesn't kick in until the end of the delay.
This is more complex, isn't it? Let's go with the launch delay. There is some
timeout after which YARN assumes there's a problem and takes away the
container; some value in minutes, I suspect
We'd need to make this delay programmable, and keep it low for the funtests,
otherwise they will start timing out.
> Add a sleep before container restart as ports may not be released from the
> last activation
> ------------------------------------------------------------------------------------------
>
> Key: SLIDER-594
> URL: https://issues.apache.org/jira/browse/SLIDER-594
> Project: Slider
> Issue Type: Bug
> Components: agent-provider
> Affects Versions: Slider 0.50
> Reporter: Sumit Mohanty
> Assignee: Jonathan Maron
> Priority: Critical
> Fix For: Slider 0.60
>
>
> This is critical for applications that do not use dynamic port and
> applications using labels do not use dynamic ports. A configurable delay
> should be added to allow for scenarios where component instances get killed
> rather than stopped.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)