Hi All,

I propose to utilize *Remote Storage as a Shuffle Store, natively in Spark*
 .

This approach would fundamentally decouple shuffle storage from compute
nodes, mitigating *shuffle fetch failures and also help with
aggressive downscaling*.

The primary goal is to enhance the *elasticity and resilience* of Spark
workloads, leading to substantial cost optimization opportunities.

*I welcome any initial thoughts or concerns regarding this idea.*
*Looking forward to your feedback! *

JIRA: SPARK-53484 <https://issues.apache.org/jira/browse/SPARK-54327>
SPIP doc
<https://docs.google.com/document/d/1leywkLgD62-MdG7e57n0vFRi7ICNxn9el9hpgchsVnk/edit?tab=t.0#heading=h.u4h68wupq6lw>
,
Design doc
<https://docs.google.com/document/d/1tuWyXAaIBR0oVD5KZwYvz7JLyn6jB55_35xeslUEu7s/edit?tab=t.0>
PoC PR <https://github.com/apache/spark/pull/53028>

Thanks,
Karuppayya

Reply via email to