[
https://issues.apache.org/jira/browse/HDDS-13905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Swaminathan Balachandran updated HDDS-13905:
--------------------------------------------
Description:
Currently bootstrap lock is acquired after the snapshot is already opened thus
this can lead to a deadlock condition during bootstrap where the Bootstrap flow
has already acquired a bootstrap lock and is waiting on snapshot cache lock to
be acquired which cannot be acquired since the snapshots are still open.
To fix this all background services should always acquire bootstrap lock before
opening a snapshot. The only con to this is that the entire task of background
service would be blocked when the bootstrap copy batch is running on the leader
om which should be ok since bootstrap would be an infrequent operation.
was:Currently bootstrap lock is acquired after the snapshot is already opened
thus this can lead to a deadlock condition during bootstrap where the Bootstrap
flow has already acquired a bootstrap lock and is waiting on snapshot cache
lock to be acquired which cannot be acquired since the snapshots are still open.
> Bootstrap lock acquired in background services can lead to deadlock
> -------------------------------------------------------------------
>
> Key: HDDS-13905
> URL: https://issues.apache.org/jira/browse/HDDS-13905
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Swaminathan Balachandran
> Assignee: Swaminathan Balachandran
> Priority: Major
> Labels: pull-request-available
>
> Currently bootstrap lock is acquired after the snapshot is already opened
> thus this can lead to a deadlock condition during bootstrap where the
> Bootstrap flow has already acquired a bootstrap lock and is waiting on
> snapshot cache lock to be acquired which cannot be acquired since the
> snapshots are still open.
> To fix this all background services should always acquire bootstrap lock
> before opening a snapshot. The only con to this is that the entire task of
> background service would be blocked when the bootstrap copy batch is running
> on the leader om which should be ok since bootstrap would be an infrequent
> operation.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]