[jira] [Commented] (SLIDER-479) Provide a slider command to kill all stranded containers continuing to run post stop command

Gour Saha (JIRA) Mon, 24 Nov 2014 10:00:29 -0800

    [ 
https://issues.apache.org/jira/browse/SLIDER-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14223206#comment-14223206
 ]


Gour Saha commented on SLIDER-479:
----------------------------------

That makes sense, but it might be tricky to define "long period of time" since 
for long running services (specifically for applications demanding a lot of 
affinity) the agents are designed to continue to run and wait for the failed AM 
to come back up. Definitely an application specific configurable time-period 
can be exposed. Would ephemeral nodes in zk make sense to control the kill of 
agents?

> Provide a slider command to kill all stranded containers continuing to run 
> post stop command
> --------------------------------------------------------------------------------------------
>
>                 Key: SLIDER-479
>                 URL: https://issues.apache.org/jira/browse/SLIDER-479
>             Project: Slider
>          Issue Type: Bug
>            Reporter: Gour Saha
>             Fix For: Slider 2.0.0
>
>
> A container can continue to run even after a slider stop command has been 
> issued. One such scenarios is when NM of a non Slider-AM node is lost and 
> before the Slider-AM could clean up the stranded agent (and the application 
> processes) slider stop command was issued. In such a scenario even if the NM 
> is brought back up it will not kill these containers.
> In a large cluster with several applications deployed/managed by slider there 
> could easily be numerous such stranded containers.
> Slider client could expose a "stop-all" command or maybe an option "stop 
> --clean" (or anything appropriate for this task) to do the cleanup. It can 
> bring up the Slider-AM in clean mode (say) which will not start any 
> application but will simply register to ZK and wait for agents to heart-beat 
> into it. Each one of these agents will receive the terminate command from the 
> AM and will do necessary cleanup and shutdown.
> This new command can be issued only after an application has been stopped. 
> When invoked while the application is running this command should fail 
> providing relevant information. This command can also provide a summary of 
> how many stranded containers it cleaned up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SLIDER-479) Provide a slider command to kill all stranded containers continuing to run post stop command

Reply via email to