[ 
https://issues.apache.org/jira/browse/SPARK-44526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Faiz Halde updated SPARK-44526:
-------------------------------
    Description: 
Hi,

This ticket is meant to understand the work that would be involved in porting 
the k8s PVC reuse feature onto the spark standalone cluster manager which 
reuses the shuffle files present locally in the disk

We are a heavy user of spot instances and we suffer from spot terminations 
impacting our long running jobs

The logic in `KubernetesLocalDiskShuffleExecutorComponents` itself is not that 
much. However when I tried this on the `LocalDiskShuffleExecutorComponents` it 
was not a successful experiment which suggests there is more to recovering 
shuffle files

I'd like to understand what will be the work involved for this. We'll be more 
than happy to contribute

  was:
Hi,

This ticket is meant to understand the work that would be involved in porting 
the k8s PVC reuse feature onto the spark standalone cluster manager which 
reuses the shuffle files present locally in the disk

We are a heavy user of spot instances and we suffer from spot terminations 
impacting our long running jobs

The logic in `KubernetesLocalDiskShuffleDataIO`
itself is not that much. However when I tried this on the 
`LocalDiskShuffleExecutorComponents` it was not a successful experiment which 
suggests there is more to it

I'd like to understand what will be the work involved for this. We'll be more 
than happy to contribute


> Porting k8s PVC reuse logic to spark standalone
> -----------------------------------------------
>
>                 Key: SPARK-44526
>                 URL: https://issues.apache.org/jira/browse/SPARK-44526
>             Project: Spark
>          Issue Type: New Feature
>          Components: Shuffle, Spark Core
>    Affects Versions: 3.4.1
>            Reporter: Faiz Halde
>            Priority: Major
>
> Hi,
> This ticket is meant to understand the work that would be involved in porting 
> the k8s PVC reuse feature onto the spark standalone cluster manager which 
> reuses the shuffle files present locally in the disk
> We are a heavy user of spot instances and we suffer from spot terminations 
> impacting our long running jobs
> The logic in `KubernetesLocalDiskShuffleExecutorComponents` itself is not 
> that much. However when I tried this on the 
> `LocalDiskShuffleExecutorComponents` it was not a successful experiment which 
> suggests there is more to recovering shuffle files
> I'd like to understand what will be the work involved for this. We'll be more 
> than happy to contribute



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to