[
https://issues.apache.org/jira/browse/FLINK-24031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17566938#comment-17566938
]
Sylvia Lin edited comment on FLINK-24031 at 7/14/22 6:08 PM:
-------------------------------------------------------------
[~wangyang0918]
JM manifest:
{code:java}
apiVersion: batch/v1
kind: Job
metadata:
name: flink-jobmanager-test
spec:
template:
metadata:
annotations:
prometheus.io/port: '9249'
prometheus.io/scrape: 'true'
labels:
app: flink
component: jobmanager
spec:
restartPolicy: OnFailure
containers:
- name: jobmanager
image: <image_name>
imagePullPolicy: Always
env:
args: ["standalone-job", "--job-classname", "TestJob.MainJob"]
ports:
- containerPort: 6123
name: rpc
- containerPort: 6124
name: blob-server
- containerPort: 8081
name: webui
livenessProbe:
tcpSocket:
port: 6123
initialDelaySeconds: 30
periodSeconds: 60
volumeMounts:
- name: flink-config-volume
mountPath: /opt/flink/conf
securityContext:
runAsUser: 9999 # refers to user _flink_ from official flink
image, change if necessary
volumes:
- name: flink-config-volume
configMap:
name: flink-config
items:
- key: flink-conf.yaml
path: flink-conf.yaml
- key: log4j-console.properties
path: log4j-console.properties
{code}
TM manifest:
{code:java}
apiVersion: apps/v1
kind: Deployment
metadata:
name: flink-taskmanager-test
spec:
replicas: 3 # here, we configure the scale
selector:
matchLabels:
app: flink
component: taskmanager
template:
metadata:
annotations:
prometheus.io/port: '9249'
prometheus.io/scrape: 'true'
labels:
app: flink
component: taskmanager
spec:
containers:
- name: taskmanager
image: <image_name>
imagePullPolicy: Always
resources:
requests:
cpu: "250m"
limits:
cpu: "500m"
env:
args: ["taskmanager"]
ports:
- containerPort: 6122
name: rpc
- containerPort: 6125
name: query-state
livenessProbe:
tcpSocket:
port: 6122
initialDelaySeconds: 30
periodSeconds: 60
volumeMounts:
- name: flink-config-volume
mountPath: /opt/flink/conf/
securityContext:
runAsUser: 9999 # refers to user _flink_ from official flink image,
change if necessary
volumes:
- name: flink-config-volume
configMap:
name: flink-config
items:
- key: flink-conf.yaml
path: flink-conf.yaml
- key: log4j-console.properties
path: log4j-console.properties {code}
JM logs:
[^JM.log]
^TM logs:^
^[^TM.log]^
was (Author: JIRAUSER292782):
[~wangyang0918]
JM manifest:
{code:java}
apiVersion: batch/v1
kind: Job
metadata:
name: flink-jobmanager-test
spec:
template:
metadata:
annotations:
prometheus.io/port: '9249'
prometheus.io/scrape: 'true'
labels:
app: flink
component: jobmanager
spec:
restartPolicy: OnFailure
containers:
- name: jobmanager
image: <image_name>
imagePullPolicy: Always
env:
args: ["standalone-job", "--job-classname", "TestJob.MainJob"]
ports:
- containerPort: 6123
name: rpc
- containerPort: 6124
name: blob-server
- containerPort: 8081
name: webui
livenessProbe:
tcpSocket:
port: 6123
initialDelaySeconds: 30
periodSeconds: 60
volumeMounts:
- name: flink-config-volume
mountPath: /opt/flink/conf
securityContext:
runAsUser: 9999 # refers to user _flink_ from official flink
image, change if necessary
volumes:
- name: flink-config-volume
configMap:
name: flink-config
items:
- key: flink-conf.yaml
path: flink-conf.yaml
- key: log4j-console.properties
path: log4j-console.properties
{code}
TM manifest:
{code:java}
apiVersion: apps/v1
kind: Deployment
metadata:
name: flink-taskmanager-test
spec:
replicas: 3 # here, we configure the scale
selector:
matchLabels:
app: flink
component: taskmanager
template:
metadata:
annotations:
prometheus.io/port: '9249'
prometheus.io/scrape: 'true'
labels:
app: flink
component: taskmanager
spec:
containers:
- name: taskmanager
image: <image_name>
imagePullPolicy: Always
resources:
requests:
cpu: "250m"
limits:
cpu: "500m"
env:
args: ["taskmanager"]
ports:
- containerPort: 6122
name: rpc
- containerPort: 6125
name: query-state
livenessProbe:
tcpSocket:
port: 6122
initialDelaySeconds: 30
periodSeconds: 60
volumeMounts:
- name: flink-config-volume
mountPath: /opt/flink/conf/
securityContext:
runAsUser: 9999 # refers to user _flink_ from official flink image,
change if necessary
volumes:
- name: flink-config-volume
configMap:
name: flink-config
items:
- key: flink-conf.yaml
path: flink-conf.yaml
- key: log4j-console.properties
path: log4j-console.properties {code}
JM logs:
[^JM.log]
> I am trying to deploy Flink in kubernetes but when I launch the taskManager
> in other container I get a Exception
> ----------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-24031
> URL: https://issues.apache.org/jira/browse/FLINK-24031
> Project: Flink
> Issue Type: Bug
> Components: Deployment / Kubernetes
> Affects Versions: 1.13.0, 1.13.2
> Reporter: Julio Pérez
> Priority: Major
> Labels: pull-request-available
> Attachments: JM.log, TM.log, flink-map.yml, jobmanager.log,
> jobmanager.yml, taskmanager.log, taskmanager.yml
>
>
> I explain here -> [https://github.com/apache/flink/pull/17020]
> I have a problem when I try to run Flink in k8s with the follow manifests
> I have the following exception
> # JobManager :
> {quote}2021-08-27 09:16:57,917 ERROR akka.remote.EndpointWriter [] - dropping
> message [class akka.actor.ActorSelectionMessage] for non-local recipient
> [Actor[akka.tcp://flink@jobmanager-hs:6123/]] arriving at
> [akka.tcp://flink@jobmanager-hs:6123] inbound addresses are
> [akka.tcp://flink@cluster:6123]
> 2021-08-27 09:17:01,255 DEBUG
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] -
> Trigger heartbeat request.
> 2021-08-27 09:17:01,284 DEBUG
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] -
> Trigger heartbeat request.
> 2021-08-27 09:17:10,008 DEBUG akka.remote.transport.netty.NettyTransport []
> - Remote connection to [/172.17.0.1:34827] was disconnected because of [id:
> 0x13ae1d03, /172.17.0.1:34827 :> /172.17.0.23:6123] DISCONNECTED
> 2021-08-27 09:17:10,008 DEBUG akka.remote.transport.ProtocolStateActor [] -
> Association between local [tcp://flink@cluster:6123] and remote
> [tcp://[email protected]:34827] was disassociated because the
> ProtocolStateActor failed: Unknown
> 2021-08-27 09:17:10,009 WARN akka.remote.ReliableDeliverySupervisor [] -
> Association with remote system [akka.tcp://[email protected]:6122] has
> failed, address is now gated for [50] ms. Reason: [Disassociated]
> {quote}
> TaskManager:
> {quote}INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could not
> resolve ResourceManager address
> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager__, retrying
> in 10000 ms: Could not connect to rpc endpoint under address
> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager__.
> INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could not
> resolve ResourceManager address
> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager__, retrying
> in 10000 ms: Could not connect to rpc endpoint under address
> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager__.
> {quote}
> Best regards,
> Julio
--
This message was sent by Atlassian Jira
(v8.20.10#820010)