errose28 commented on pull request #1836:
URL: https://github.com/apache/ozone/pull/1836#issuecomment-769303605


   I think there are some subtle failure cases that can cause the cancel 
prepare request and --upgrade flag to behave in different ways depending on 
where the failure occurs.
   We should probably pick an expected behavior for each request, and make sure 
that it will be observed regardless of where a failure occurs.
   
   My curent understanding is that if an OM goes down/gets disconnected and 
misses a cancel prepare request, it should come back cancelled, but if the 
majority of OMs are started with --upgrade flag, the downed OM should still 
should come back up in its original prepared or unprepared state before the 
crash. This follows the behavior we would see if all 3 OMs were up during the 
whole process, because cancel prepare is a ratis request for the cluster, and 
--upgrade applies only to the OM it is passed to.
   If this is the behavior we want, we can use the marker file for indicating 
preparedness on restart, and the DB to indicate preparedness on snapshot 
install.
   Below I have outlined some usages for cancel prepare and --upgrade that may 
lead to different results on an OM depending on when in the process it fails, 
based on the current implementation.
   
   ## Cases for cancel prepare
   
   ### Cancel Prepare Case 1 (expected)
   
   1. om1 down
   2. om2 and om3 prepared
   3. om2 and om3 cancelled by request
   4. om1 resumed
       - om1 will come back **unprepared**, as expected, because it will not 
see a DB entry for preparation in its snapshot.
   
   ### Cancel Prepare Request Case 2 (unexpected)
   
   1. om1 om2 and om3 prepared
   2. om1 down.
   3. om2 and om3 cancelled by request
   4. om1 resumed
       - om1 comes back **prepared**, because marker file is read to indicate 
it is prepared, and the value received in the snapshot is never used.
       - This is not expected behavior because cancel prepare is a ratis 
request and should be acknowledged when received in the snapshot.
       - om1 has all txns now and could become leader, moving cluster to a 
pseudo-prepared state where only leader is prepared.
   
   ## Cases for --upgrade
   
   ### --upgrade Case 1 (expected)
   
   1. om1 down
   2. om2 and om3 prepared
   3. om2 and om3 cancelled by --upgrade
   4. om1 resumed
       - om1 comes back **prepared** because the snapshot it receives still has 
a prepare DB entry, and the txn index and prepare index in the DB match.
       - This is expected, because prepare is a ratis request, and --upgrade is 
not.
   
   ### --upgrade Case 2 (maybe unexpected?)
   
   1. om1 down
   2. om2 and om3 prepared
   3. om2 and om3 cancelled by --upgrade
   4. Transaction applied on om2 and om3
   4. om1 resumed
       - om1 sees the prepare DB entry, but the prepare index does not match 
the transaction index.
       - The current implementation does not have an explicit case to handle 
this, so the default behavior would be the OM is **unprepared**.
       - This could be unepxected because om1 behaves as if both the prepare 
request and --upgrade were ratis requests applied from the snapshot, even 
though this is not the case.
   
   - I think this case is atypical usage, however, since we are expecting all 
three OMs to be started with --upgrade before the cluster is used again.
   
   ### --upgrade Case 3 (expected)
   
   1. om1 om2 and om3 prepared
   2. om1 down.
   3. om2 and om3 cancelled by --upgrade
   4. om1 resumed
       - om1 comes back **prepared** because it has a marker file, the prepare 
entry in the DB is ignored.
       - This is expected behavior becausse --upgrade is not a ratis request.
   
   ## Summary
   
   Below is an outline of what I think should happen on each event to cause the 
desired behavior. Almost all of this is present in the current implementation, 
but it is outlined here for clarity with *(new)* marking the changes. This 
should handle *cancel prepare request case 2*. It will also maintain current 
behavior for *--upgrade case 3*, although now om1 will come back prepared first 
because of the marker file, but the following snapshot install will be 
processed as well, leaving it in prepared state.
   
   This will still not handle *--upgrade case 2*, but I am not sure if this is 
a case we need to handle.
   
   Unlike the current implementation, this will leave the prepare entry in the 
DB, and use indices matching or mismatching to indicate whether that entry mean 
the OM is prepared or unprepared. I think this is less confusing since 
--upgrade was leaving the entry in the DB anyways, but we can discuss.
   
   - On prepare request:
       1. Write marker file.
       2. Update index of prepare DB entry
   
   - On cancel prepare request (new):
       1. Delete marker file.
       2. Do not touch prepare DB entry
   
   - On --upgrade startup:
       1. Delete marker file
       2. Do not touch prepare DB entry
   
   - On normal startup:
       1. Prepare if valid marker file.
       2. Do not read prepare DB entry.
   
   - On install snapshot (new):
       - If no prepare DB entry:
           - Do nothing. Cluster has never been prepared before.
       - Else:
           - If prepare DB index == txn info DB index:
               - Prepare and write marker file.
           - Else:
               - Delete marker file and cancel prepare.
                   - These steps should succeed even if the OM was not prepared.
   
   - Let me know your thought on this, and if I missed/misunderstood anything.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to