devmadhuu opened a new pull request, #3947:
URL: https://github.com/apache/ozone/pull/3947

   Root Cause:
   When recon is down during the window a new container got created and went 
missing, recon doesn't have any way to know as data nodes were down and no one 
sending report to recon, and also recon will not sync with SCM DB due to a 
condition of if Container number diff between SCM and Recon is greater than 
threshold value defined in "ozone.recon.scm.container.threshold", then recon 
SCM rocks DB is updated and sync with scm Rocks DB, however this is only at 
Recon startup, after that Recon is only dependent on container reports from 
data nodes. A part from above, there is another design point that SCM rocks DB 
doesn't get updated with container data if "ozone.scm.ratis.enable" is set as 
true which is default true and data flushes to SCM rocks DB when ratis snapshot 
is taken which depends on "dfs.ratis.snapshot.threshold" value (default 10000), 
so even if we want to do a periodic sync with SCM rocks DB, it will not give 
any info except SCM Rocks DB is updated or flushed.
   
   Changes done to fix the issue:
   Used periodic sync to recon container cache from SCM container cache using 
SCM listContainer API exposed at rpc port.
   
   https://issues.apache.org/jira/browse/HDDS-3486
   
   ## How was this patch tested?
   
   Tested manually using test cases as well as bringing recon down and add a 
new container in SCM and make it missing container.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to