Alex created SPARK-38662:
----------------------------
Summary: Spark looses k8s auth after some time
Key: SPARK-38662
URL: https://issues.apache.org/jira/browse/SPARK-38662
Project: Spark
Issue Type: Bug
Components: Kubernetes
Affects Versions: 3.2.1
Reporter: Alex
Spark starts to fail with error listed below after some time of working:
{noformat}
[2022-03-25 17:11:12,706] INFO (Logging.scala:57) - Adding decommission script
to lifecycle
[2022-03-25 17:11:12,712] WARN (Logging.scala:90) - Exception when notifying
snapshot subscriber.
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST
at: https://cluster_endpoint/api/v1/namespaces/spark/pods. Message:
Unauthorized! Token may have expired! Please log-in again. Unauth
orized.
at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:639)
at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:576)
at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:543)
at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:504)
at
io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:292)
at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:893)
at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:372)
at
io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:86)
at
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$1(ExecutorPodsAllocator.scala:400)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
at
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.requestNewExecutors(ExecutorPodsAllocator.scala:382)
at
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36(ExecutorPodsAllocator.scala:346)
at
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36$adapted(ExecutorPodsAllocator.scala:339)
at
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.onNewSnapshots(ExecutorPodsAllocator.scala:339)
at
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3(ExecutorPodsAllocator.scala:117)
at
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3$adapted(ExecutorPodsAllocator.scala:117)
at
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber.org$apache$spark$scheduler$cluster$k8s$ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber$$processSnapshotsInt
ernal(ExecutorPodsSnapshotsStoreImpl.scala:138)
at
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber.processSnapshots(ExecutorPodsSnapshotsStoreImpl.scala:126)
at
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl.$anonfun$addSubscriber$1(ExecutorPodsSnapshotsStoreImpl.scala:81)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at
java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834){noformat}
This doesn't reproduce on 3.1.1 with the same configs, environment and workload.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]