Re: [DISCUSS] Remote Storage as a Shuffle Store

Enrico Minack Thu, 13 Nov 2025 01:13:38 -0800

Hi,

another remark regarding a remote shuffle storage solution:

As long as the map executors are alive, reduce executors should readfrom them to avoid any extra delay / overhead.On fetch failure from a map executor, the reduce executors should fallback to a remote storage that provides a copy (merged or not) of theshuffle data.


Cheers,
Enrico


Am 13.11.25 um 09:42 schrieb Enrico Minack:

Hi Karuppayya,

thanks for your proposal and bringing up this issue.
I am very much in favour of a shuffle storage solution that allows fordynamic allocation and node failure in a K8S environment, without theburden of managing an Remote Shuffle Service.
I have the following comments:
Your proposed consolidation stage is equivalent to the next reducerstage in the sense that it reads shuffle data from the earlier mapstage. This requires the executors of the map stage to survive untilthe shuffle data are consolidated ("merged" in Spark terminology).Therefore, I think this passage of your design document is not accurate:
Executors that perform the initial map tasks (shuffle writers) canbe immediately deallocated after writing their shuffle data ...
Since the consolidation stage reads all the shuffle data, why notdoing the transformation in that stage? What is the point in deferringthe transformations into another stage?
You mention the "Native Shuffle Block Migration" and say itslimitation is "It simply shifts the storage burden to other activeexecutors".Please consider that the migration process can migrate to a (in Sparkcalled) fallback storage, which essentially copies the shuffle data toa remote storage.
Kind regards,
Enrico

Am 13.11.25 um 01:40 schrieb karuppayya:
 Hi All,
I propose to utilize *Remote Storage as a Shuffle Store, natively inSpark* .
This approach would fundamentally decouple shuffle storage fromcompute nodes, mitigating *shuffle fetch failures and also help withaggressive downscaling*.
The primary goal is to enhance the *elasticity and resilience* ofSpark workloads, leading to substantial cost optimization opportunities.
*I welcome any initial thoughts or concerns regarding this idea.*

*Looking forward to your feedback! *

JIRA: SPARK-53484 <https://issues.apache.org/jira/browse/SPARK-54327>
SPIP doc<https://docs.google.com/document/d/1leywkLgD62-MdG7e57n0vFRi7ICNxn9el9hpgchsVnk/edit?tab=t.0#heading=h.u4h68wupq6lw>,Design doc<https://docs.google.com/document/d/1tuWyXAaIBR0oVD5KZwYvz7JLyn6jB55_35xeslUEu7s/edit?tab=t.0>
PoC PR <https://github.com/apache/spark/pull/53028>

Thanks,
Karuppayya

Re: [DISCUSS] Remote Storage as a Shuffle Store

Reply via email to