ChuanHaiTan created FLINK-10775:
-----------------------------------

             Summary: Quarantined address 
[akka.tcp://flink@flink-jobmanager:6123] is still unreachable or has not been 
restarted. Keeping it quarantined.
                 Key: FLINK-10775
                 URL: https://issues.apache.org/jira/browse/FLINK-10775
             Project: Flink
          Issue Type: Bug
          Components: ResourceManager
    Affects Versions: 1.4.2
         Environment: k8s+docker 

standalone (1jobmanager + 5taskmanager)

taskmanager.slotnum=4
            Reporter: ChuanHaiTan
         Attachments: 
logs-from-flink-jobmanager-in-flink-jobmanager-65c8d85f4f-5fm2d.txt, 
logs-from-flink-taskmanager-in-flink-taskmanager-758575577d-7lw82.txt, 
logs-from-flink-taskmanager-in-flink-taskmanager-758575577d-qbj9g.txt, 
微信图片_20181031171312.png, 微信图片_20181031171316.png

On the k8s+docker environment, the 1 jobmanager container and 5 taskmanager 
container are the standalone cluster modes.

{color:#FF0000}But for some reason, the jobmanager is rebooted, and two of the 
remaining three taskmanger are also rebooted, and two of the remaining three 
taskmanger don't connect to jobmanager, resulting in insufficient slot 
resources reporting errors.{color}

The attachments are the jobmanager log, two disconnected taskmanger logs, and 
all available and unavailable taskmanager screenshots of flink at the time.

It is strange that two rebooted taskmanger can connect with jobmanager, and one 
of the three unrebooted taskamanagers can connect.

Why?Can the cause of the restart be analyzed from the log?thank you



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to