devmadhuu opened a new pull request, #8098:
URL: https://github.com/apache/ozone/pull/8098

   ## What changes were proposed in this pull request?
   This PR change is to handle failure of Recon bootstrap and its OM tasks.
   If any OM task failed during bootstrapping of Recon (Full OM DB snapshot), 
then failed OM tasks needs to be handled to bootstrap and reprocess of OM tasks 
again. For partial or corrupted receive of OM DB tar ball, recon should clean 
and delete the tar ball and start the fetch of OM DB tar ball from scratch.
   
   **Following cases can be there and handled accordingly by this change.**
   
   ```
   Case 1: Normal bootstrap flow will take care of this scenario.
   full snapshot: DB not Updated
     - Om Snapshot number - 0
     - Om Delta snapshot number - 0
     - All Om Tasks snapshot number - 0
   
   Case 2: This case will force Recon to run reprocess of only those OM tasks 
whose last updated sequence number is zero
   full snapshot: DB Updated, Tasks not reprocessed, Recon restarted or crash
     - Om Snapshot number - 100000
     - Om Delta snapshot number - 0
     - Few Om Tasks snapshot number - 0, remaining Om tasks snapshot number - 
100000
   
   Case 3: This case will force Recon to run reprocess of all OM tasks
   full snapshot: DB Updated, Tasks not reprocessed, Recon restarted or crash
     - Om Snapshot number - 100000
     - Om Delta snapshot number - 0
     - All Om Tasks snapshot number - 0
   
   Case 4: This case will not force to reprocess any OM tasks and on restart of 
Recon, bootstrap normal flow will be okay. 
   full snapshot: DB Updated, Tasks reprocessed, but before delta DB applied, 
Recon restarted or crash
     - Om Snapshot number - 100000
     - Om Delta snapshot number - 0
     - All Om Tasks snapshot number - 100000
   
   Case 5: This case will force Recon to run reprocess of all OM tasks
   full snapshot: DB Updated, Tasks reprocessed, delta DB updates also applied, 
recon restarted or crash, but all delta tasks not processed 
     - Om Snapshot number - 100000
     - Om Delta snapshot number - 100010
     - All Om Tasks snapshot number - 100000   
   
   Case 6: This case will force Recon to run reprocess of only those OM tasks 
whose last updated sequence number is less than Om Delta snapshot number
   full snapshot: DB Updated, Tasks reprocessed, delta DB updates also applied, 
recon restarted or crash, but delta tasks not processed 
     - Om Snapshot number - 100000
     - Om Delta snapshot number - 100010
     - Few Om Tasks snapshot number - 100000  , Remaining Om Tasks snapshot 
number - 100010
   ```
   
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/HDDS-12615
   
   ## How was this patch tested?
   This patch is tested manually and with local docker cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to