Oscar Torreno created SPARK-49079:
-------------------------------------

             Summary: Spark jobs failing with UnknownHostException on executors 
if driver readiness timeout elapsed
                 Key: SPARK-49079
                 URL: https://issues.apache.org/jira/browse/SPARK-49079
             Project: Spark
          Issue Type: Bug
          Components: k8s
    Affects Versions: 3.5.0
         Environment: Running spark jobs inside EMR on EKS offering from AWS, 
which is using 3.5.0 under the hood
            Reporter: Oscar Torreno


We have seen cases where Spark jobs would fail to run in case 
ExecutorPodsAllocator times out while waiting for the driver pod to get to the 
READY status. If that happens, we have seen 2 possible scenarios leading to the 
same result (executors failing with an UnknownHostException trying to resolve 
the k8s service for spark driver):
 * Kubernetes service not getting created (confirmed that with the k8s service 
created event/metric available in grafana)
 * Kubernetes service being there but still not being able to resolve the 
hostname in the executors (maybe the service being fully available only when 
driver pod got ready and executors tried to resolve the hostname prior to that)

The particular part of the code under question is 
[https://github.com/apache/spark/blob/v3.5.0/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala#L130]
 
{code:java}
    driverPod.foreach { pod =>
      // Wait until the driver pod is ready before starting executors, as the 
headless service won't
      // be resolvable by DNS until the driver pod is ready.
      Utils.tryLogNonFatalError {
        kubernetesClient
          .pods()
          .inNamespace(namespace)
          .withName(pod.getMetadata.getName)
          .waitUntilReady(driverPodReadinessTimeout, TimeUnit.SECONDS)
      }
    } {code}
Interestingly enough the comment says wait until the driver pod otherwise the 
service will not be resolvable by executors, but we still let the run to 
continue.

Also worth mentioning the documentation around such readiness timeout config 
([https://github.com/apache/spark/blob/v3.5.0/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala#L454])
{code:java}
  val KUBERNETES_ALLOCATION_DRIVER_READINESS_TIMEOUT =
    ConfigBuilder("spark.kubernetes.allocation.driver.readinessTimeout")
      .doc("Time to wait for driver pod to get ready before creating executor 
pods. This wait " +
        "only happens on application start. If timeout happens, executor pods 
will still be " +
        "created.")
      .version("3.1.3")
      .timeConf(TimeUnit.SECONDS)
      .checkValue(value => value > 0, "Allocation driver readiness timeout must 
be a positive "
        + "time value.")
      .createWithDefaultString("1s") {code}
Please note the "If timeout happens, executor pods will still be created", 
which conflicts (at least in my head) with the code comment on the await we 
have for the driver pod.

The question would be, is this intended behaviour? Looks like a bug, maybe we 
should check before creating the executors once again whether driver pod is 
ready and otherwise fail the job?

For now trying to mitigate by increasing the readiness timeout value as a 
bandaid fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to