[
https://issues.apache.org/jira/browse/SPARK-35416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18058245#comment-18058245
]
Romain Manni-Bucau edited comment on SPARK-35416 at 2/12/26 11:36 PM:
----------------------------------------------------------------------
[~dongjoon] I'm not sure I understand which bug you do refer to, the operator
assign a PVC to an executor which is already assign to the driver so it can't
work. Do you mean a spark operator bug? Is there a ticket about it? I'm on EKS
with k8s 1.33 with ebs CSI so it should work if no bug in the operator AFAIK.
Indeed PVC are "OnDemand" on spark 4.0.1.
I'm also not sure how it is supposed to work between
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator#getReusablePVCs
and
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator#replacePVCsIfNeeded
since they just check the class and storage size (no distinction between
executor and driver for ex) and there is no lock or "skip cause in progress
flag" in
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator#onNewSnapshots
which runs every sec and uses the API a lot so it can easily overlap - leading
to bad pvc reuse.
For driver vs executor spark-role label filtering could help but it seems the
issue can happen between executors as well, for executor checking the status of
the PVC != Bound can help and if the pod is pending at next iteration check the
status of the PVC (Bound but the pod has "N node has pod using
PersistentVolumeClaim with the same name and ReadWriteOncePod access
mode" with N > 1 snippet in the latest condition), potentially fix it too by
either deleting/recreating the pod or patching the PVC.
was (Author: romain.manni-bucau):
[~dongjoon] I'm not sure I understand which bug you do refer to, the operator
assign a PVC to an executor which is already assign to the driver so it can't
work. Do you mean a spark operator bug? Is there a ticket about it? I'm on EKS
with k8s 1.33 with ebs CSI so it should work if no bug in the operator AFAIK.
Indeed PVC are "OnDemand" on spark 4.0.1.
> Support PersistentVolumeClaim Reuse
> -----------------------------------
>
> Key: SPARK-35416
> URL: https://issues.apache.org/jira/browse/SPARK-35416
> Project: Spark
> Issue Type: Improvement
> Components: Kubernetes
> Affects Versions: 3.2.0
> Reporter: Dongjoon Hyun
> Assignee: Dongjoon Hyun
> Priority: Major
> Fix For: 3.2.0
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]