L. C. Hsieh created SPARK-53948:
-----------------------------------
Summary: Fix deadlock in Observation
Key: SPARK-53948
URL: https://issues.apache.org/jira/browse/SPARK-53948
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.5.7
Reporter: L. C. Hsieh
Observation class has been evolved a few times during Spark 3.5 to Spark 4.0.0.
Previously it uses locking mechanism (synchronized) between get and onFinish
methods to coordinate metrics update and retrieval.
But it has a potential deadlocking bug. If get is called before
ObservationListener is triggered to call onFinish, get will forever be waiting
for metrics because it locks the observation object by synchronized so later
onFinish call is locked out from updating the metrics.
This locking mechanism was replaced by a promise by SPARK-49423. But in the PR,
I don’t see the deadlock bug was mentioned, and there is no bug fix PR proposed
to earlier versions. So I think that the bug was not known and the fix is
unintentional in Spark 4.0.0. The bug is still in Spark 3.5 branch.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]