[
https://issues.apache.org/jira/browse/FLINK-25764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17481136#comment-17481136
]
Chesnay Schepler commented on FLINK-25764:
------------------------------------------
> So, the TaskManager without argument has its own container id as hostname for
> the `JOB_MANAGER_RPC_ADDRESS`. I don't see how this works anyway.
It's not meant to work for distributed clusters, just like the default in the
flink config ("localhost") isn't.
I don't know why we use a different default in the docker images; it was
already there 5 years to when the docker images where contributed to Flink (in
fact it is part of the very first commit).
Maybe there is some functional difference between {{localhost}} and
{{$(hostname -f)}}; I couldn't find one in some quick tests. A cluster started
just fine within a single container, and port-forwarding worked as expected.
Maybe there was never a difference, or something has changed on the docker side
in the last 5 years. I just don't know, hence why I'm apprehensive about
changing it.
> Docker sets JobManager's rpc address to same host by default
> ------------------------------------------------------------
>
> Key: FLINK-25764
> URL: https://issues.apache.org/jira/browse/FLINK-25764
> Project: Flink
> Issue Type: Bug
> Components: Deployment / Kubernetes, flink-docker
> Affects Versions: 1.15.0
> Reporter: Niklas Semmler
> Priority: Minor
> Labels: usability
>
> In the [docker
> entrypoint|https://github.com/apache/flink-docker/blob/master/1.14/scala_2.12-java8-debian/docker-entrypoint.sh],
> the JOB_MANAGER_RPC_ADDRESS is set to the current host by default (line 25).
> This environment variable overrides the value set for jobmanager.rpc.address
> in the flink config (line 78, 71). For the TaskManager, this means that it
> tries to find the JobManager on the same host. When this is not the case, the
> TaskManager will retry and ultimately terminate. Hence, for cluster
> deployments, the variable has to be defined when starting the docker.
> For Kubernetes deployments, the TaskManager cannot connect to the
> jobmanager.rpc.address even when it is defined by the flink configmap.
> However, we don't see this problem pop up, because for now the configmap is
> mounted read-only into the containers (see FLINK-21383 for more details).
> To simplify this configuration, I propose to (a) never set a default setting
> for JOB_MANAGER_RPC_ADDRESS at all or (b) never set a default setting for any
> non-JobManager container. The only down-side is that all docker deployments
> will have to define JOB_MANAGER_RPC_ADDRESS, even when TaskManager and
> JobManager run on the same node.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)