swamirishi commented on PR #5035: URL: https://github.com/apache/ozone/pull/5035#issuecomment-1629931455
> Just a side note. As I was reviewing this PR, I realized we can have other potential problem areas here. When follower tries to catch up with the leader, it should start with initial state as absolutely empty. That is because, we can not assume that sst files between leader and follower will have the same content. RocksDB instances and the content of the rocksDB directory on the OM nodes are only logically equal but physically they could be spread over completely different sst files. cc : @GeorgeJahad @smengcl @hemantk-12 This sst file walk happens on the candidate directory we exclude the list of sst files already present. There is also a race condition happening when multiple requests downloading & installing the snapshot from the leader comes together. The problem with multiple request is one request might be copying data from the same directory checkpoint & the request might be actually doing the ls. So if one request finishes the installation & starts the sst filtering service, it can so happen we might end up deleting files which have not been copied. But the manifest files are getting updated. Thus there is also a need to make this function syunchronized to ensure there is only one thread downloading & installing the snapshot. It will be eventually consistent. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
