[
https://issues.apache.org/jira/browse/FLINK-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16724737#comment-16724737
]
Nagarjun Guraja commented on FLINK-11127:
-----------------------------------------
[~spoganshev] I am wondering, modifying the docker entrypoint script to first
configure *taskmanager.host* with pod ip and then invoke taskmanager.sh should
also do the trick instead of using init container right? Do you see any issue
with that approach other than getting the workaround to docker image as opposed
to handling externally? **
> Make metrics query service establish connection to JobManager
> -------------------------------------------------------------
>
> Key: FLINK-11127
> URL: https://issues.apache.org/jira/browse/FLINK-11127
> Project: Flink
> Issue Type: Improvement
> Components: Distributed Coordination, Kubernetes, Metrics
> Affects Versions: 1.7.0
> Reporter: Ufuk Celebi
> Priority: Major
>
> As part of FLINK-10247, the internal metrics query service has been separated
> into its own actor system. Before this change, the JobManager (JM) queried
> TaskManager (TM) metrics via the TM actor. Now, the JM needs to establish a
> separate connection to the TM metrics query service actor.
> In the context of Kubernetes, this is problematic as the JM will typically
> *not* be able to resolve the TMs by name, resulting in warnings as follows:
> {code}
> 2018-12-11 08:32:33,962 WARN akka.remote.ReliableDeliverySupervisor
> - Association with remote system
> [akka.tcp://flink-metrics@flink-task-manager-64b868487c-x9l4b:39183] has
> failed, address is now gated for [50] ms. Reason: [Association failed with
> [akka.tcp://flink-metrics@flink-task-manager-64b868487c-x9l4b:39183]] Caused
> by: [flink-task-manager-64b868487c-x9l4b: Name does not resolve]
> {code}
> In order to expose the TMs by name in Kubernetes, users require a service
> *for each* TM instance which is not practical.
> This currently results in the web UI not being to display some basic metrics
> about number of sent records. You can reproduce this by following the READMEs
> in {{flink-container/kubernetes}}.
> This worked before, because the JM is typically exposed via a service with a
> known name and the TMs establish the connection to it which the metrics query
> service piggybacked on.
> A potential solution to this might be to let the query service connect to the
> JM similar to how the TMs register.
> I tagged this ticket as an improvement, but in the context of Kubernetes I
> would consider this to be a bug.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)