Livia Zhu created SPARK-51596:
---------------------------------
Summary: Fix concurrent StateStoreProvider maintenance and closing
Key: SPARK-51596
URL: https://issues.apache.org/jira/browse/SPARK-51596
Project: Spark
Issue Type: Task
Components: Structured Streaming
Affects Versions: 4.0.0
Reporter: Livia Zhu
Currently, both the task thread and maintenance thread can call unload() on a
provider. This leads to a race condition where the maintenance could be
conducting maintenance while the task thread is closing the provider, leading
to unexpected behavior (such as UCSHandle closed exception).
We want to guarantee that when maintenance is run, the provider is not closed.
The easiest way to do this is to move the unload operation into the maintenance
thread. To continue unloading ASAP (rather than potentially waiting for the
maintenance interval) as was done by
https://issues.apache.org/jira/browse/SPARK-33827, we should immediately
trigger a maintenance thread to do the unload.
This gives us an extra benefit that unloading other providers doesn't block the
task thread. To capitalize on this, unload() should not hold the
loadedProviders lock the entire time (which will block other task threads), but
instead release it once it has deleted the unloading providers from the map and
close the providers without the lock held.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]