[
https://issues.apache.org/jira/browse/SLIDER-594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193863#comment-14193863
]
Sumit Mohanty commented on SLIDER-594:
--------------------------------------
Well the simplest fix, for now, is to add delay to all container starts through
a configurable delay value. We can create a patch and see if thats reasonable.
While we noticed few scenarios of container restart failing due to port
conflict, it is also evident that without YARN-1922 containers were not getting
cleaned up properly. So likely, port conflict was a valid issue as the prior
processes did not get killed.
Agent monitoring the ports and delaying the start seems like a good way to go.
This will require the app definition to indicate which properties are ports. As
a long term fix, we should investigate a notion of named ports that are
referred to by other application properties. That way Slider/Yarn allocates N
ports to the app based on how many the app wants and these ports are available
in a well-known named list of ports - literally can be "allocated_ports":
port1=..., port2=...," etc. App config can refer to them as
allocated_ports[index].
> Add a sleep before container restart as ports may not be released from the
> last activation
> ------------------------------------------------------------------------------------------
>
> Key: SLIDER-594
> URL: https://issues.apache.org/jira/browse/SLIDER-594
> Project: Slider
> Issue Type: Bug
> Components: agent-provider
> Affects Versions: Slider 0.50
> Reporter: Sumit Mohanty
> Assignee: Jonathan Maron
> Priority: Critical
> Fix For: Slider 0.60
>
>
> This is critical for applications that do not use dynamic port and
> applications using labels do not use dynamic ports. A configurable delay
> should be added to allow for scenarios where component instances get killed
> rather than stopped.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)