Apache9 commented on PR #4966:
URL: https://github.com/apache/hbase/pull/4966#issuecomment-1457507133

   For last pushed sequence id column family, it used to prevent replication 
start before the previous ones finish. So aftere restart, if there are some 
values left, there are only two possible result:
   1. It indicates an older last pushed sequence id so the new edits can not be 
pushed, the replication will hang there.
   2. We just finished the replication for a whole region's life time so the 
new replication can go.
   In both cases we will not replicate unnecessary edits, so we are free to 
bring up the cluster and then try to fix the data.
   
   For hfile refs family, it just used to tell cleaner to not delete some 
hfiles, so we are also free to bring the cluster up first, and then try to fix 
the data.
   
   So I think we can write a flag file under the ReplicationSyncUp directory, 
include the timestamp we start the sync up tool, and after restart, we delete 
all the data in last pushed sequence id and hfile refs family which timestamp 
is less than the timestamp in the flag file. We could use a procedure to do 
this and once this is done, we remove the flag file, as part of the procedure's 
step.
   
   @2005hithlj WDYT? I will try to implement this soon if no big concerns.
   
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to