Hi All, I propose to utilize *Remote Storage as a Shuffle Store, natively in Spark* .
This approach would fundamentally decouple shuffle storage from compute nodes, mitigating *shuffle fetch failures and also help with aggressive downscaling*. The primary goal is to enhance the *elasticity and resilience* of Spark workloads, leading to substantial cost optimization opportunities. *I welcome any initial thoughts or concerns regarding this idea.* *Looking forward to your feedback! * JIRA: SPARK-53484 <https://issues.apache.org/jira/browse/SPARK-54327> SPIP doc <https://docs.google.com/document/d/1leywkLgD62-MdG7e57n0vFRi7ICNxn9el9hpgchsVnk/edit?tab=t.0#heading=h.u4h68wupq6lw> , Design doc <https://docs.google.com/document/d/1tuWyXAaIBR0oVD5KZwYvz7JLyn6jB55_35xeslUEu7s/edit?tab=t.0> PoC PR <https://github.com/apache/spark/pull/53028> Thanks, Karuppayya
