Ziqi Liu created SPARK-43300:
--------------------------------

             Summary: Cascade failure in Guava cache due to fate-sharing
                 Key: SPARK-43300
                 URL: https://issues.apache.org/jira/browse/SPARK-43300
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.4.0
            Reporter: Ziqi Liu


Guava cache is widely used in spark, however, it suffers from fate-sharing 
behavior: If there are multiple requests trying to access the same key in the 
{{cache}} at the same time when the key is not in the cache, Guava cache will 
block all requests and create the object only once. If the creation fails, all 
requests will fail immediately without retry. So we might see task failure due 
to irrelevant failure in other queries due to fate sharing.

This fate sharing behavior might lead to unexpected results in some situation.

We can wrap around Guava cache with a KeyLock to synchronize all requests with 
the same key, so they will run individually and fail as if they come one at a 
time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to