swamirishi commented on PR #5035:
URL: https://github.com/apache/ozone/pull/5035#issuecomment-1629931455

   > Just a side note. As I was reviewing this PR, I realized we can have other 
potential problem areas here. When follower tries to catch up with the leader, 
it should start with initial state as absolutely empty. That is because, we can 
not assume that sst files between leader and follower will have the same 
content. RocksDB instances and the content of the rocksDB directory on the OM 
nodes are only logically equal but physically they could be spread over 
completely different sst files. cc : @GeorgeJahad @smengcl @hemantk-12
   This sst file walk happens on the candidate directory we exclude the list of 
sst files already present. There is also a race condition happening when 
multiple requests downloading & installing the snapshot from the leader comes 
together. The problem with multiple request is one request might be copying 
data from the same directory checkpoint & the request might be actually doing 
the ls. So if one request finishes the installation & starts the sst filtering 
service, it can so happen we might end up deleting files which have not been 
copied. But the manifest files are getting updated. Thus there is also a need 
to make this function syunchronized to ensure there is only one thread 
downloading & installing the snapshot. It will be eventually consistent.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to