[ 
https://issues.apache.org/jira/browse/SPARK-35416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18058245#comment-18058245
 ] 

Romain Manni-Bucau edited comment on SPARK-35416 at 2/12/26 11:36 PM:
----------------------------------------------------------------------

[~dongjoon] I'm not sure I understand which bug you do refer to, the operator 
assign a PVC to an executor which is already assign to the driver so it can't 
work. Do you mean a spark operator bug? Is there a ticket about it? I'm on EKS 
with k8s 1.33 with ebs CSI so it should work if no bug in the operator AFAIK. 
Indeed PVC are "OnDemand" on spark 4.0.1.

 

I'm also not sure how it is supposed to work between 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator#getReusablePVCs 
and 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator#replacePVCsIfNeeded
 since they just check the class and storage size (no distinction between 
executor and driver for ex) and there is no lock or "skip cause in progress 
flag" in 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator#onNewSnapshots 
which runs every sec and uses the API a lot so it can easily overlap - leading 
to bad pvc reuse.

For driver vs executor spark-role label filtering could help but it seems the 
issue can happen between executors as well, for executor checking the status of 
the PVC != Bound can help and if the pod is pending at next iteration check the 
status of the PVC (Bound but the pod has "N node has pod using 
PersistentVolumeClaim       with the same name and ReadWriteOncePod access 
mode" with N > 1 snippet in the latest condition), potentially fix it too by 
either deleting/recreating the pod or patching the PVC.

 


was (Author: romain.manni-bucau):
[~dongjoon] I'm not sure I understand which bug you do refer to, the operator 
assign a PVC to an executor which is already assign to the driver so it can't 
work. Do you mean a spark operator bug? Is there a ticket about it? I'm on EKS 
with k8s 1.33 with ebs CSI so it should work if no bug in the operator AFAIK. 
Indeed PVC are "OnDemand" on spark 4.0.1.

> Support PersistentVolumeClaim Reuse
> -----------------------------------
>
>                 Key: SPARK-35416
>                 URL: https://issues.apache.org/jira/browse/SPARK-35416
>             Project: Spark
>          Issue Type: Improvement
>          Components: Kubernetes
>    Affects Versions: 3.2.0
>            Reporter: Dongjoon Hyun
>            Assignee: Dongjoon Hyun
>            Priority: Major
>             Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to