Petri created SPARK-37910:
-----------------------------

             Summary: Spark executor self-exiting due to driver disassociated 
in Kubernetes with client deploy-mode
                 Key: SPARK-37910
                 URL: https://issues.apache.org/jira/browse/SPARK-37910
             Project: Spark
          Issue Type: Bug
          Components: Kubernetes
    Affects Versions: 3.2.0
            Reporter: Petri


I have Spark driver running in a Kubernetes pod with client deploy-mode and it 
tries to start an executor.
Executor will fail with error:

    \{"type":"log", "level":"ERROR", "name":"STREAMING_OTHERS", 
"time":"2022-01-14T12:29:38.318Z", "timezone":"UTC", 
"class":"dispatcher-Executor", 
"method":"spark.executor.CoarseGrainedExecutorBackend.logError(73)", 
"log":"Executor self-exiting due to : Driver 
192-168-39-71.mni-system.pod.cluster.local:40752 disassociated! Shutting 
down.\n"}

Then driver will attempt to start another executor which fails with same error 
and this goes on and on.

In the driver pod, I see only following errors:

    22/01/14 12:26:32 ERROR TaskSchedulerImpl: Lost executor 1 on 
192.168.43.250:
    22/01/14 12:27:16 ERROR TaskSchedulerImpl: Lost executor 2 on 
192.168.43.233:
    22/01/14 12:27:59 ERROR TaskSchedulerImpl: Lost executor 3 on 
192.168.43.221:
    22/01/14 12:28:43 ERROR TaskSchedulerImpl: Lost executor 4 on 
192.168.43.217:
    22/01/14 12:29:27 ERROR TaskSchedulerImpl: Lost executor 5 on 
192.168.43.197:
    22/01/14 12:30:10 ERROR TaskSchedulerImpl: Lost executor 6 on 
192.168.43.237:
    22/01/14 12:30:53 ERROR TaskSchedulerImpl: Lost executor 7 on 
192.168.43.196:
    22/01/14 12:31:42 ERROR TaskSchedulerImpl: Lost executor 8 on 
192.168.43.228:
    22/01/14 12:32:31 ERROR TaskSchedulerImpl: Lost executor 9 on 
192.168.43.254:
    22/01/14 12:33:14 ERROR TaskSchedulerImpl: Lost executor 10 on 
192.168.43.204:
    22/01/14 12:33:57 ERROR TaskSchedulerImpl: Lost executor 11 on 
192.168.43.231:

What is wrong? And how can I get executors running correctly?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to