L. C. Hsieh created SPARK-53948:
-----------------------------------

             Summary: Fix deadlock in Observation
                 Key: SPARK-53948
                 URL: https://issues.apache.org/jira/browse/SPARK-53948
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.5.7
            Reporter: L. C. Hsieh


Observation class has been evolved a few times during Spark 3.5 to Spark 4.0.0. 
Previously it uses locking mechanism (synchronized) between get and onFinish 
methods to coordinate metrics update and retrieval.

But it has a potential deadlocking bug. If get is called before 
ObservationListener is triggered to call onFinish, get will forever be waiting 
for metrics because it locks the observation object by synchronized so later 
onFinish call is locked out from updating the metrics.

This locking mechanism was replaced by a promise by SPARK-49423. But in the PR, 
I don’t see the deadlock bug was mentioned, and there is no bug fix PR proposed 
to earlier versions. So I think that the bug was not known and the fix is 
unintentional in Spark 4.0.0. The bug is still in Spark 3.5 branch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to