Re: KubernetesLocalDiskShuffleDataIO mount path dependency doubt.

Dongjoon Hyun Fri, 11 Aug 2023 08:53:36 -0700

Hi, Arun.

SPARK-35593 (Support shuffle data recovery on the reused PVCs) was Apache
Spark 3.2.0 feature whose plugin follows only the legacy Spark shuffle
directory structure to be safe.


You can see the AS-IS test coverage in the corresponding
`KubernetesLocalDiskShuffleDataIOSuite`.

https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/shuffle/KubernetesLocalDiskShuffleDataIOSuite.scala

To be clear, Apache Spark keeps the supported directory structure without
any changes for historic reasons.

You can use different structures by simply implementing your own plugin
like KubernetesLocalDiskShuffleDataIO. It's extensible.

Dongjoon.


On Fri, Aug 11, 2023 at 4:52 AM Arun Ravi <arunrav...@gmail.com> wrote:

> Hi Team,
>
> I am using the recently released shuffle recovery feature using
> `KubernetesLocalDiskShuffleDataIO` plugin class on Spark 3.4.1.
>
> Can someone explain why the mount path has spark-x/executor-x/ pattern
> dependency? I got this path detail from this PR
> <https://github.com/apache/spark/pull/42417>. Is it to avoid other
> folders in the volume ? Also, does this mean the path should use executor
> ID and spark app id or just hardcoded spark-x/executor-x/? Sorry, I
> couldn't fully understand the reasoning for this. Any help will be super
> useful.
>
>
> Arun Ravi M V
> B.Tech (Batch: 2010-2014)
>
> Computer Science and Engineering
>
> Govt. Model Engineering College
> Cochin University Of Science And Technology
> Kochi
> arunrav...@gmail.com
> +91 9995354581
> Skype : arunravimv
>

Re: KubernetesLocalDiskShuffleDataIO mount path dependency doubt.

Reply via email to