Livia Zhu created SPARK-51596:
---------------------------------

             Summary: Fix concurrent StateStoreProvider maintenance and closing
                 Key: SPARK-51596
                 URL: https://issues.apache.org/jira/browse/SPARK-51596
             Project: Spark
          Issue Type: Task
          Components: Structured Streaming
    Affects Versions: 4.0.0
            Reporter: Livia Zhu


Currently, both the task thread and maintenance thread can call unload() on a 
provider. This leads to a race condition where the maintenance could be 
conducting maintenance while the task thread is closing the provider, leading 
to unexpected behavior (such as UCSHandle closed exception).

We want to guarantee that when maintenance is run, the provider is not closed. 
The easiest way to do this is to move the unload operation into the maintenance 
thread. To continue unloading ASAP (rather than potentially waiting for the 
maintenance interval) as was done by 
https://issues.apache.org/jira/browse/SPARK-33827, we should immediately 
trigger a maintenance thread to do the unload.

This gives us an extra benefit that unloading other providers doesn't block the 
task thread. To capitalize on this, unload() should not hold the 
loadedProviders lock the entire time (which will block other task threads), but 
instead release it once it has deleted the unloading providers from the map and 
close the providers without the lock held.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to