[ 
https://issues.apache.org/jira/browse/HDDS-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17368912#comment-17368912
 ] 

Hanisha Koneru commented on HDDS-5338:
--------------------------------------

{quote}If it has more than 1 node, that means it is already ratis enabled 
cluster, why do we need to download checkpoint at all in this scenario (In this 
case should ratis take care of downloading checkpoint through installSnapshot)?
{quote}
Yes, more than 1 node means ratis enabled. But here is the scenario I am 
thinking. om1 was the leader and it got network partitioned and the DB got 
corrupted. om2 is the new leader (for om2 and om3, om1 still falsely thinks its 
leader). om4 bootstraps and downloads a corrupted checkpoint from om1 (as it 
still thinks it is leader). om4 sends setConf request to om1 and it fails as 
om1 now steps down from leader position. So om4 sends setConf request to om2 
and it succeeds. om4 jouins the ring with a corrupted DB. 
This is a very very corner case I know. But I think the download checkpoint and 
setConf should be atomic (as atomic as possible). So I think the install 
checkpoint notification should come from leader node when it is executing the 
setConf.


{quote}Now the question will be how to distinguish when to download, so if we 
can pass a flag to bootstrap node that it is being converted from non-ha to ha 
then only download snapshot. (This is just one of the way to solve this)
{quote}
RocksDB does not have this information. So not sure how we can distinguish 
this. Easiest solution would be to download checkpoint every time. I think this 
will not be a significant overhead because most likely the at least some ratis 
logs would have been purged and if not purged, then the DB checkpoint size 
would be significantly small. 
Please let me know if you have any ideas on how we can distinguish this case.

> Handle Bootstrap when original OM has non-ratis transactions
> ------------------------------------------------------------
>
>                 Key: HDDS-5338
>                 URL: https://issues.apache.org/jira/browse/HDDS-5338
>             Project: Apache Ozone
>          Issue Type: Sub-task
>    Affects Versions: 1.2.0
>            Reporter: Hanisha Koneru
>            Assignee: Hanisha Koneru
>            Priority: Major
>
> When non-Ratis OM is converted to ratis enabled OM, there could be 
> transactions in the RocksDB which are not part of the Ratis logs. If the 
> Ratis logs are not purged when a new OM is bootstrapped, it will just get all 
> the Ratis logs from the old OM. The non-ratis transactions in the RocksDB 
> will not be transferred to the new OM as Ratis will not know that there are 
> transactions in the DB not present in the logs. 
> So when a new OM is bootstrapping, we should check the DB for non-ratis 
> transactions and if any are present, the new OM should download the DB from 
> existing OM before the setConf request is sent out.
> Thanks [~bharat] for identifying this scenario 
> [here|https://github.com/apache/ozone/pull/1494#issuecomment-859329558] .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to