[ 
https://issues.apache.org/jira/browse/FLINK-25764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17481136#comment-17481136
 ] 

Chesnay Schepler commented on FLINK-25764:
------------------------------------------

> So, the TaskManager without argument has its own container id as hostname for 
> the `JOB_MANAGER_RPC_ADDRESS`. I don't see how this works anyway.

It's not meant to work for distributed clusters, just like the default in the 
flink config ("localhost") isn't.

I don't know why we use a different default in the docker images; it was 
already there 5 years to when the docker images where contributed to Flink (in 
fact it is part of the very first commit).
Maybe there is some functional difference between {{localhost}} and 
{{$(hostname -f)}}; I couldn't find one in some quick tests. A cluster started 
just fine within a single container, and port-forwarding worked as expected.
Maybe there was never a difference, or something has changed on the docker side 
in the last 5 years. I just don't know, hence why I'm apprehensive about 
changing it.

> Docker sets JobManager's rpc address to same host by default
> ------------------------------------------------------------
>
>                 Key: FLINK-25764
>                 URL: https://issues.apache.org/jira/browse/FLINK-25764
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / Kubernetes, flink-docker
>    Affects Versions: 1.15.0
>            Reporter: Niklas Semmler
>            Priority: Minor
>              Labels: usability
>
> In the [docker 
> entrypoint|https://github.com/apache/flink-docker/blob/master/1.14/scala_2.12-java8-debian/docker-entrypoint.sh],
>  the JOB_MANAGER_RPC_ADDRESS is set to the current host by default (line 25). 
> This environment variable overrides the value set for jobmanager.rpc.address 
> in the flink config (line 78, 71). For the TaskManager, this means that it 
> tries to find the JobManager on the same host. When this is not the case, the 
> TaskManager will retry and ultimately terminate. Hence, for cluster 
> deployments, the variable has to be defined when starting the docker.
> For Kubernetes deployments, the TaskManager cannot connect to the 
> jobmanager.rpc.address even when it is defined by the flink configmap. 
> However, we don't see this problem pop up, because for now the configmap is 
> mounted read-only into the containers (see FLINK-21383 for more details).
> To simplify this configuration, I propose to (a) never set a default setting 
> for JOB_MANAGER_RPC_ADDRESS at all or (b) never set a default setting for any 
> non-JobManager container. The only down-side is that all docker deployments 
> will have to define JOB_MANAGER_RPC_ADDRESS, even when TaskManager and 
> JobManager run on the same node.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to