Enrico Minack created SPARK-52508:
-------------------------------------
Summary: Read from fallback storage should consider replication
delay
Key: SPARK-52508
URL: https://issues.apache.org/jira/browse/SPARK-52508
Project: Spark
Issue Type: Sub-task
Components: k8s, Kubernetes
Affects Versions: 4.1.0
Reporter: Enrico Minack
Using the storage decommissioning feature on Kubernetes with a distributed
filesystem as the fallback storage might run into the situation where an
executor cannot see the shuffle data on the distributed filesystem that has
just been written by the decommissioned executor. This is caused by some
replication delay. Given the dependent executor knows the location of the
shuffle data is the fallback storage, it can defer reading on a
{{FileNotFoundException}}.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]