[ 
https://issues.apache.org/jira/browse/HDDS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDDS-12090:
-----------------------------------
    Release Note: 
  OM Follower Bootstrap Failures with Snapshots

  A race condition during the Ozone Manager (OM) bootstrap process could 
corrupt the database on follower nodes when snapshots were in use.
  This could cause the OM bootstrap to fail and impact cluster stability.

  This release introduces a lock to prevent this race condition, ensuring that 
OM bootstrapping is reliable and that the database remains
  consistent.

  As part of this change, a new OM checkpoint endpoint /dbCheckpointv2 is 
introduced. New clients (2.2.0 and above) uses the new endpoint, whereas the 
old clients continues to use the old checkpoint endpoint /dbCheckpoint.

  The new behavior can be switched off using configuration property 
ozone.om.db.checkpoint.use.inode.based.transfer. Default is true.



  was:
  OM Follower Bootstrap Failures with Snapshots

  A race condition during the Ozone Manager (OM) bootstrap process could 
corrupt the database on follower nodes when snapshots were in use.
  This could cause the OM bootstrap to fail and impact cluster stability.

  This release introduces a lock to prevent this race condition, ensuring that 
OM bootstrapping is reliable and that the database remains
  consistent.

  As part of this change, a new OM checkpoint endpoint /dbCheckpointv2 is 
introduced. New clients (2.2.0 and above) uses the new endpoint, whereas the 
old clients continues to use the old checkpoint endpoint /dbCheckpoint.

  There are no configuration changes or special upgrade procedures required for 
this fix.




> Fix Snapshot Bootstrapping race condition to prevent snapshot corruption
> ------------------------------------------------------------------------
>
>                 Key: HDDS-12090
>                 URL: https://issues.apache.org/jira/browse/HDDS-12090
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Swaminathan Balachandran
>            Assignee: Swaminathan Balachandran
>            Priority: Major
>
> Currently there is an issue with the existing bootstrapping logic when 
> dealing with Snapshotted OM Rocksdb. While bootstrapping no locks are taken 
> and the bootstrapping runs along with active transactions happening on the 
> snapshot rocksdb which could lead to having a corrupted Rocksdb instance post 
> bootstrap on the follower OM. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to