[
https://issues.apache.org/jira/browse/FLINK-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959444#comment-15959444
]
ASF GitHub Bot commented on FLINK-5974:
---------------------------------------
GitHub user vijikarthi opened a pull request:
https://github.com/apache/flink/pull/3692
FLINK-5974 Added configurations to support mesos-dns hostname resolution
This PR addresses FLINK-5974 requirements which takes care of handling
dynamic host name resolution for JM and TM components especially in some
deployment environment like Mesos/DCOS.
It addresses two main functionalities.
a) Dynamic host name configuration
Support for specifying hostname for JM/TM is already available through
`-jobmanager.rpc.address` and `taskmanager.hostname` configurations.
However in Mesos DC/OS type of environment, each task container can be
looked up using an hostname alias which is derived using the format
`<task>.<service>.mesos` where the service discovery is managed through
`mesos-dns`. To support these dynamic hostname lookup, we have introduced a new
configuration `mesos.resourcemanager.tasks.hostname` which takes the format
`_TASK.<ANY_VALUE>`.
When this property is supplied, the `_TASK` token will be replaced with the
`TASK_ID` of the TM container and the final derived string will be used to
populate `taskmanager.hostname` configuration.
For example, in DCOS setup one could supply the configuration as
`-Dmesos.resourcemanager.tasks.hostname=_TASK.{{FRAMEWORK_NAME}}.mesos` where
`FRAMEWORK_NAME` could be `flink`
Please refer to
https://docs.mesosphere.com/1.9/usage/service-discovery/mesos-dns/service-naming/#a-records
for more details on how Mesos service discovery works.
b) Support to run *any* bootstrap script prior to execute TM startup script
Currently, the TM boot script `mesos-taskmanager.sh` is the only script
that is passed to Mesos launcher for booting TM container.
In DC/OS environment where service discovery is common, we need a mechanism
to wait for the service discovery records to be available and the hostname is
indeed resolvable before launching the TM boot script.
DCOS deployment offers a way to validate and wait for the service discovery
records to be available before launching the tasks. Please see below links for
more details on how it works.
https://mesosphere.github.io/dcos-commons/developer-guide.html#task-bootstrap
https://github.com/mesosphere/dcos-commons/blob/master/sdk/bootstrap/main.go
To support this, we have introduced a new configuration
`mesos.resourcemanager.tasks.cmd-prefix=$FLINK_HOME/bin/bootstrap` to provide
any executable/script that can be configured to run prior to executing the TM
bootstrap command.
This feature *currently* works *only for Docker based image* where the
bootstrap script can be pre-baked in to a specific location that can be used to
configure `mesos.resourcemanager.tasks.cmd-prefix'.
While both the implementations are helping in addressing the Mesos/DCOS
type of deployment but the implementation is agnostic of these environments and
can be used for any generic deployment that may need such a facility.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/vijikarthi/flink FLINK-5974-Master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3692.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3692
----
commit aeb432dc7fe8bcdd5faa49b8ad5dfb5630ea0747
Author: Vijay Srinivasaraghavan <[email protected]>
Date: 2017-04-06T16:48:39Z
FLINK-5974 Added configurations to support mesos-dns hostname resolution
----
> Support Mesos DNS
> -----------------
>
> Key: FLINK-5974
> URL: https://issues.apache.org/jira/browse/FLINK-5974
> Project: Flink
> Issue Type: Improvement
> Components: Cluster Management, Mesos
> Reporter: Eron Wright
> Assignee: Vijay Srinivasaraghavan
>
> In certain Mesos/DCOS environments, the slave hostnames aren't resolvable.
> For this and other reasons, Mesos DNS names would ideally be used for
> communication within the Flink cluster, not the hostname discovered via
> `InetAddress.getLocalHost`.
> Some parts of Flink are already configurable in this respect, notably
> `jobmanager.rpc.address`. However, the Mesos AppMaster doesn't use that
> setting for everything (e.g. artifact server), it uses the hostname.
> Similarly, the `taskmanager.hostname` setting isn't used in Mesos deployment
> mode. To effectively use Mesos DNS, the TM should use
> `<task-name>.<framework-name>.mesos` as its hostname. This could be derived
> from an interpolated configuration string.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)