[
https://issues.apache.org/jira/browse/SPARK-44526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Faiz Halde updated SPARK-44526:
-------------------------------
Description:
Hi,
This ticket is meant to understand the work that would be involved in porting
the k8s PVC reuse feature onto the spark standalone cluster manager which
reuses the shuffle files present locally in the disk
We are a heavy user of spot instances and we suffer from spot terminations
impacting our long running jobs
The logic in `KubernetesLocalDiskShuffleDataIO`
itself is not that much. However when I tried this on the
`LocalDiskShuffleExecutorComponents` it was not a successful experiment which
suggests there is more to it
I'd like to understand what will be the work involved for this. We'll be more
than happy to contribute
was:
Hi,
This ticket is meant to understand the work that would be involved in porting
the k8s PVC reuse feature onto the spark standalone cluster manager which
reuses the shuffle files present locally in the disk
We are a heavy user of spot instances and we suffer from spot terminations
impacting our long running jobs
The logic in
KubernetesLocalDiskShuffleDataIO
itself is not that much. However when I tried this on the
`LocalDiskShuffleExecutorComponents` it was not a successful experiment which
suggests there is more to it
I'd like to understand what will be the work involved for this. We'll be more
than happy to contribute
> Porting k8s PVC reuse logic to spark standalone
> -----------------------------------------------
>
> Key: SPARK-44526
> URL: https://issues.apache.org/jira/browse/SPARK-44526
> Project: Spark
> Issue Type: New Feature
> Components: Shuffle, Spark Core
> Affects Versions: 3.4.1
> Reporter: Faiz Halde
> Priority: Major
>
> Hi,
> This ticket is meant to understand the work that would be involved in porting
> the k8s PVC reuse feature onto the spark standalone cluster manager which
> reuses the shuffle files present locally in the disk
> We are a heavy user of spot instances and we suffer from spot terminations
> impacting our long running jobs
> The logic in `KubernetesLocalDiskShuffleDataIO`
> itself is not that much. However when I tried this on the
> `LocalDiskShuffleExecutorComponents` it was not a successful experiment which
> suggests there is more to it
> I'd like to understand what will be the work involved for this. We'll be more
> than happy to contribute
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]