[ 
https://issues.apache.org/jira/browse/FLINK-11632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774171#comment-16774171
 ] 

Till Rohrmann commented on FLINK-11632:
---------------------------------------

Sounds good to me [~1u0]. Thanks for addressing our comments.

> Make TaskManager automatic bind address picking more explicit (by default) 
> and more configurable
> ------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-11632
>                 URL: https://issues.apache.org/jira/browse/FLINK-11632
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Coordination, Network, TaskManager
>            Reporter: Alex
>            Assignee: Alex
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, there is an optional {{taskmanager.host}} configuration option in 
> {{flink-conf.yaml}} that allows users of Flink to "statically" pre-define 
> what should be a bind address for TaskManager to listen on (note: it's also 
> possible to override this option by passing corresponding command line option 
> to Flink).
> In case when the option is not set, TaskManager would try [heuristically pick 
> up a bind 
> address|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L421-L442].
> The resulting address (hostname) is used to advertise different service 
> endpoints (running in TM) to the JobManager. Also it would be resolved to an 
> {{[InetAddress|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L359]}}
>  later that used as binding address for TMs inner node communication.
> This proposal is to minimize usage of heuristics (by default) by introducing 
> a new configuration option (for example, {{taskmanager.host.bind-policy}}) 
> with possible values:
>  * {{"hostname"}} - default, use TM's host's name ({{== 
> InetAddress.getLocalHost().getHostName()}};
>  * {{"ip"}} - use TM's host's ip address ({{== 
> InetAddress.getLocalHost().getHostAddress()}});
>  * {{"auto-detect-hostname"}} - use the heuristics based detection mechanism.
> *Note:* the configuration key and values could be named better and open for 
> proposals.
> *Note 2:* in the future, the configuration option _may_ require to be 
> extended to allow choosing some specific network interface, or preference of 
> ipv6 vs ipv4.
> h3. Rationale
> [The heuristics 
> mechanism|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/net/ConnectionUtils.java#L364-L475]
>  tries to establish a probe connection to {{jobmanager.rpc.address}} from 
> different network interface addresses. 
>  In case of parallel setups (when JM and multiple TMs start simultaneously, 
> in parallel), this depends on timing, assigned network ip addresses and may 
> end up with "non-uniform" address bindings of TMs (some may be "lucky" to 
> pick up non default network interface, some would fallback to 
> {{InetAddress.getLocalHost().getHostName()}}. At the end, it's less obvious 
> and transparent which binding address a TM picks up.
> In practice, it's possible that in majority of cases (in well setup 
> environments) the heuristics mechanism returns a result that matches 
> {{InetAddress.getLocalHost()}}. The proposal is to stick with this more 
> simpler and explicit binding (by default), avoiding non-determinism of 
> heuristics.
> The old mechanism is kept available, in case if it is useful in some setups. 
> But would require explicit configuration setting.
> Additionally, this proposal extends "auto configuration" option by allowing 
> users to choose the host's ip address (instead of hostname). This may be 
> convenient in situations where the TMs' machines are not necessary reachable 
> via DNS (for example in a Kubernetes setup).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to