[ 
https://issues.apache.org/jira/browse/FLINK-11632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774031#comment-16774031
 ] 

Alex commented on FLINK-11632:
------------------------------

[~uce], [~till.rohrmann],

because of risk of being not backwards compatible, I propose the following 
rollout steps:

 1. Introduce configuration option to allow configuring TMs address binding 
(the options are still the same as proposed in this ticket: hostname, ip, 
auto-detect-hostname. The latest option is by default to preserve old 
behavior). Try to land this change in Flink 1.8 release.

 2. I'll try to find more evidence and arguments to not use heuristic by 
default later. If that convinces you, make "hostname" option as default in some 
future major Flink version (1.x).

 3. Once, the heuristic is not used by default, try to understand when others 
still set it explicitly and if there are alternatives (like 
{{taskmanager.host}}), if the Flink cluster in that case in production or in 
experimental environments (like running in docker containers on personal 
computer).

> Make TaskManager automatic bind address picking more explicit (by default) 
> and more configurable
> ------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-11632
>                 URL: https://issues.apache.org/jira/browse/FLINK-11632
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Coordination, Network, TaskManager
>            Reporter: Alex
>            Assignee: Alex
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, there is an optional {{taskmanager.host}} configuration option in 
> {{flink-conf.yaml}} that allows users of Flink to "statically" pre-define 
> what should be a bind address for TaskManager to listen on (note: it's also 
> possible to override this option by passing corresponding command line option 
> to Flink).
> In case when the option is not set, TaskManager would try [heuristically pick 
> up a bind 
> address|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L421-L442].
> The resulting address (hostname) is used to advertise different service 
> endpoints (running in TM) to the JobManager. Also it would be resolved to an 
> {{[InetAddress|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L359]}}
>  later that used as binding address for TMs inner node communication.
> This proposal is to minimize usage of heuristics (by default) by introducing 
> a new configuration option (for example, {{taskmanager.host.bind-policy}}) 
> with possible values:
>  * {{"hostname"}} - default, use TM's host's name ({{== 
> InetAddress.getLocalHost().getHostName()}};
>  * {{"ip"}} - use TM's host's ip address ({{== 
> InetAddress.getLocalHost().getHostAddress()}});
>  * {{"auto-detect-hostname"}} - use the heuristics based detection mechanism.
> *Note:* the configuration key and values could be named better and open for 
> proposals.
> *Note 2:* in the future, the configuration option _may_ require to be 
> extended to allow choosing some specific network interface, or preference of 
> ipv6 vs ipv4.
> h3. Rationale
> [The heuristics 
> mechanism|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/net/ConnectionUtils.java#L364-L475]
>  tries to establish a probe connection to {{jobmanager.rpc.address}} from 
> different network interface addresses. 
>  In case of parallel setups (when JM and multiple TMs start simultaneously, 
> in parallel), this depends on timing, assigned network ip addresses and may 
> end up with "non-uniform" address bindings of TMs (some may be "lucky" to 
> pick up non default network interface, some would fallback to 
> {{InetAddress.getLocalHost().getHostName()}}. At the end, it's less obvious 
> and transparent which binding address a TM picks up.
> In practice, it's possible that in majority of cases (in well setup 
> environments) the heuristics mechanism returns a result that matches 
> {{InetAddress.getLocalHost()}}. The proposal is to stick with this more 
> simpler and explicit binding (by default), avoiding non-determinism of 
> heuristics.
> The old mechanism is kept available, in case if it is useful in some setups. 
> But would require explicit configuration setting.
> Additionally, this proposal extends "auto configuration" option by allowing 
> users to choose the host's ip address (instead of hostname). This may be 
> convenient in situations where the TMs' machines are not necessary reachable 
> via DNS (for example in a Kubernetes setup).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to