[
https://issues.apache.org/jira/browse/SPARK-51596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Livia Zhu updated SPARK-51596:
------------------------------
Description:
Currently, both the task thread and maintenance thread can call unload() on a
provider. This leads to a race condition where the maintenance could be
conducting maintenance while the task thread is closing the provider, leading
to unexpected behavior.
We want to guarantee that when maintenance is run, the provider is not
closed/closing. The easiest way to do this is to move the unload operation into
the maintenance thread. To continue unloading ASAP (rather than potentially
waiting for the maintenance interval) as was introduced by
https://issues.apache.org/jira/browse/SPARK-33827, we should immediately
trigger a maintenance thread to do the unload.
This gives us an extra benefit that unloading other providers doesn't block the
task thread. To capitalize on this, unload() should not hold the
loadedProviders lock the entire time (which will block other task threads), but
instead release it once it has deleted the unloading providers from the map and
close the providers without the lock held.
was:
Currently, both the task thread and maintenance thread can call unload() on a
provider. This leads to a race condition where the maintenance could be
conducting maintenance while the task thread is closing the provider, leading
to unexpected behavior.
We want to guarantee that when maintenance is run, the provider is not
closed/closing. The easiest way to do this is to move the unload operation into
the maintenance thread. To continue unloading ASAP (rather than potentially
waiting for the maintenance interval) as was done by
https://issues.apache.org/jira/browse/SPARK-33827, we should immediately
trigger a maintenance thread to do the unload.
This gives us an extra benefit that unloading other providers doesn't block the
task thread. To capitalize on this, unload() should not hold the
loadedProviders lock the entire time (which will block other task threads), but
instead release it once it has deleted the unloading providers from the map and
close the providers without the lock held.
> Fix concurrent StateStoreProvider maintenance and closing
> ---------------------------------------------------------
>
> Key: SPARK-51596
> URL: https://issues.apache.org/jira/browse/SPARK-51596
> Project: Spark
> Issue Type: Task
> Components: Structured Streaming
> Affects Versions: 4.0.0
> Reporter: Livia Zhu
> Priority: Major
>
> Currently, both the task thread and maintenance thread can call unload() on a
> provider. This leads to a race condition where the maintenance could be
> conducting maintenance while the task thread is closing the provider, leading
> to unexpected behavior.
> We want to guarantee that when maintenance is run, the provider is not
> closed/closing. The easiest way to do this is to move the unload operation
> into the maintenance thread. To continue unloading ASAP (rather than
> potentially waiting for the maintenance interval) as was introduced by
> https://issues.apache.org/jira/browse/SPARK-33827, we should immediately
> trigger a maintenance thread to do the unload.
> This gives us an extra benefit that unloading other providers doesn't block
> the task thread. To capitalize on this, unload() should not hold the
> loadedProviders lock the entire time (which will block other task threads),
> but instead release it once it has deleted the unloading providers from the
> map and close the providers without the lock held.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]