Ziqi Liu created SPARK-43300:
--------------------------------
Summary: Cascade failure in Guava cache due to fate-sharing
Key: SPARK-43300
URL: https://issues.apache.org/jira/browse/SPARK-43300
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 3.4.0
Reporter: Ziqi Liu
Guava cache is widely used in spark, however, it suffers from fate-sharing
behavior: If there are multiple requests trying to access the same key in the
{{cache}} at the same time when the key is not in the cache, Guava cache will
block all requests and create the object only once. If the creation fails, all
requests will fail immediately without retry. So we might see task failure due
to irrelevant failure in other queries due to fate sharing.
This fate sharing behavior might lead to unexpected results in some situation.
We can wrap around Guava cache with a KeyLock to synchronize all requests with
the same key, so they will run individually and fail as if they come one at a
time.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]