Devesh Kumar Singh created HDDS-12475:
-----------------------------------------

             Summary: Ozone Recon - Handle auto recovery of data during error 
in reinitialising of all OM tasks when fetching full snapshot of OM DB
                 Key: HDDS-12475
                 URL: https://issues.apache.org/jira/browse/HDDS-12475
             Project: Apache Ozone
          Issue Type: Task
          Components: Ozone Recon
            Reporter: Devesh Kumar Singh
            Assignee: Devesh Kumar Singh


Ozone Recon - Handle auto recovery of data during error in reinitialising of 
all OM tasks when fetching full snapshot of OM DB

There could be few edge cases:
 * If Recon was stopped for sometime and when it come back online, OM DB 
compaction, during that downtime, may force recon to fetch full snapshot and 
reinitialise all OM based recon tasks and in this flow if any of the OM tasks 
reinitialisation fails, lastRunTaskStatus will confirm failure,  but in next 
run of sync OM iteration, failed task may go for delta updates but had missed 
OM DB updates in last run of full snapshot completely.
 * Even if Recon is up and running in cluster continuously,  there is a 
possibility that Recon may start lagging over a period of time if OM DB write 
TPS is very high in cluster. In such a case, recon has a mechanism to fall back 
on full snapshot and reinitialise all OM based recon tasks and in this flow if 
any of the OM tasks reinitialisation fails, lastRunTaskStatus will confirm 
failure,  but in next run of sync OM iteration, failed task may go for delta 
updates but had missed OM DB updates in last run of full snapshot completely.

 

And in both above edge cases, above failures may completely go silent and 
unnoticed and even existing metrics like lastRunTaskStatus which recorded 
failure may be overridden with next delta run status which may be success.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to