[jira] [Commented] (FLINK-24031) I am trying to deploy Flink in kubernetes but when I launch the taskManager in other container I get a Exception

Sylvia Lin (Jira) Wed, 13 Jul 2022 22:03:10 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-24031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17566640#comment-17566640
 ]


Sylvia Lin commented on FLINK-24031:
------------------------------------

[~wangyang0918] Yeah, i'm using below configMap, and the exact same thing work 
for another EKS cluster. I can confirm for the EKS cluster doesn't work 
correctly, it cannot resolve host flink-jobmanager, other dns resolution works 
fine on the same cluster:
{code:java}
~$ curl flink-jobmanager
curl: (6) Could not resolve host: flink-jobmanager {code}
configMap:
{code:java}
apiVersion: v1
kind: ConfigMap
metadata:
  name: flink-config
  labels:
    app: flink
data:
  flink-conf.yaml: |+
    kubernetes.cluster-id: <cluster_name>
    fs.allowed-fallback-filesystems: s3
    state.backend: rocksdb
    state.backend.incremental: true
    state.backend.local-recovery: true
    jobmanager.rpc.address: flink-jobmanager
    taskmanager.numberOfTaskSlots: 2
    blob.server.port: 6124
    jobmanager.rpc.port: 6123
    taskmanager.rpc.port: 6122
    jobmanager.memory.process.size: 1600m
    taskmanager.memory.process.size: 1728m  
    restart-strategy: fixeddelay
    restart-strategy.fixed-delay.attempts: 100000
    scheduler-mode: reactive
    metrics.reporter.prom.class: 
org.apache.flink.metrics.prometheus.PrometheusReporter
    heartbeat.timeout: 8000
    heartbeat.interval: 5000
    rest.flamegraph.enabled: true
    hive.s3.use-instance-credentials: true {code}

> I am trying to deploy Flink in kubernetes but when I launch the taskManager 
> in other container I get a Exception
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-24031
>                 URL: https://issues.apache.org/jira/browse/FLINK-24031
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / Kubernetes
>    Affects Versions: 1.13.0, 1.13.2
>            Reporter: Julio Pérez
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.13.1
>
>         Attachments: flink-map.yml, jobmanager.log, jobmanager.yml, 
> taskmanager.log, taskmanager.yml
>
>
>  I explain here -> [https://github.com/apache/flink/pull/17020]
> I have a problem when I try to run Flink in k8s with the follow manifests
> I have the following exception
>  # JobManager :
> {quote}2021-08-27 09:16:57,917 ERROR akka.remote.EndpointWriter [] - dropping 
> message [class akka.actor.ActorSelectionMessage] for non-local recipient 
> [Actor[akka.tcp://flink@jobmanager-hs:6123/]] arriving at 
> [akka.tcp://flink@jobmanager-hs:6123] inbound addresses are 
> [akka.tcp://flink@cluster:6123]
>  2021-08-27 09:17:01,255 DEBUG 
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
> Trigger heartbeat request.
>  2021-08-27 09:17:01,284 DEBUG 
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
> Trigger heartbeat request.
>  2021-08-27 09:17:10,008 DEBUG akka.remote.transport.netty.NettyTransport [] 
> - Remote connection to [/172.17.0.1:34827] was disconnected because of [id: 
> 0x13ae1d03, /172.17.0.1:34827 :> /172.17.0.23:6123] DISCONNECTED
>  2021-08-27 09:17:10,008 DEBUG akka.remote.transport.ProtocolStateActor [] - 
> Association between local [tcp://flink@cluster:6123] and remote 
> [tcp://[email protected]:34827] was disassociated because the 
> ProtocolStateActor failed: Unknown
>  2021-08-27 09:17:10,009 WARN akka.remote.ReliableDeliverySupervisor [] - 
> Association with remote system [akka.tcp://[email protected]:6122] has 
> failed, address is now gated for [50] ms. Reason: [Disassociated]
> {quote}
> TaskManager:
> {quote}INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could not 
> resolve ResourceManager address 
> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager__, retrying 
> in 10000 ms: Could not connect to rpc endpoint under address 
> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager__.
>  INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could not 
> resolve ResourceManager address 
> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager__, retrying 
> in 10000 ms: Could not connect to rpc endpoint under address 
> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager__.
> {quote}
> Best regards,
> Julio



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-24031) I am trying to deploy Flink in kubernetes but when I launch the taskManager in other container I get a Exception

Reply via email to