[ 
https://issues.apache.org/jira/browse/SLIDER-594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193863#comment-14193863
 ] 

Sumit Mohanty commented on SLIDER-594:
--------------------------------------

Well the simplest fix, for now, is to add delay to all container starts through 
a configurable delay value. We can create a patch and see if thats reasonable.

While we noticed few scenarios of container restart failing due to port 
conflict, it is also evident that without YARN-1922 containers were not getting 
cleaned up properly. So likely, port conflict was a valid issue as the prior 
processes did not get killed.

Agent monitoring the ports and delaying the start seems like a good way to go. 
This will require the app definition to indicate which properties are ports. As 
a long term fix, we should investigate a notion of named ports that are 
referred to by other application properties. That way Slider/Yarn allocates N 
ports to the app based on how many the app wants and these ports are available 
in a well-known named list of ports - literally can be "allocated_ports": 
port1=..., port2=...," etc. App config can refer to them as 
allocated_ports[index].

> Add a sleep before container restart as ports may not be released from the 
> last activation
> ------------------------------------------------------------------------------------------
>
>                 Key: SLIDER-594
>                 URL: https://issues.apache.org/jira/browse/SLIDER-594
>             Project: Slider
>          Issue Type: Bug
>          Components: agent-provider
>    Affects Versions: Slider 0.50
>            Reporter: Sumit Mohanty
>            Assignee: Jonathan Maron
>            Priority: Critical
>             Fix For: Slider 0.60
>
>
> This is critical for applications that do not use dynamic port and 
> applications using labels do not use dynamic ports. A configurable delay 
> should be added to allow for scenarios where component instances get killed 
> rather than stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to