+1
张铎(Duo Zhang) <[email protected]> 于2023年5月10日周三 21:20写道: > > Oh, it seems finally the 3 VOTE emails are all sent... > > Sorry for the spam... > > Liangjun He <[email protected]> 于2023年5月10日周三 19:36写道: > > > +1 > > > > > > At 2023-05-10 01:13:12, "张铎(Duo Zhang)" <[email protected]> wrote: > > >The issue is about moving replication queue storage from zookeeper to a > > >hbase table. This is the last piece of persistent data on zookeeper. So > > >after this feature merged, we are finally fine to say that all data on > > >zookeeper can be removed while restarting a cluster. > > > > > >Let me paste the release note here > > > > > >We introduced a table based replication queue storage in this issue. The > > >> queue data will be stored in hbase:replication table. This is the last > > >> piece of persistent data on zookeeper. So after this change, we are OK > > to > > >> clean up all the data on zookeeper, as now they are all transient, a > > >> cluster restarting can fix everything. > > >> > > >> The data structure has been changed a bit as now we only support an > > offset > > >> for a WAL group instead of storing all the WAL files for a WAL group. > > >> Please see the replication internals section in our ref guide for more > > >> details. > > >> > > >> To break the cyclic dependency issue, i.e, creating a new WAL writer > > >> requires writing to replication queue storage first but with table based > > >> replication queue storage, you first need a WAL writer when you want to > > >> update to table, now we will not record a queue when creating a new WAL > > >> writer instance. The downside for this change is that, the logic for > > >> claiming queue and WAL cleaner are much more complicated. See > > >> AssignReplicationQueuesProcedure and ReplicationLogCleaner for more > > details > > >> if you have interest. > > >> > > >> Notice that, we will use a separate WAL provider for hbase:replication > > >> table, so you will see a new WAL file for the region server which holds > > the > > >> hbase:replication table. If we do not do this, the update to > > >> hbase:replication table will also generate some WAL edits in the WAL > > file > > >> we need to track in replication, and then lead to more updates to > > >> hbase:replication table since we have advanced the replication offset. > > In > > >> this way we will generate a lot of garbage in our WAL file, even if we > > >> write nothing to the cluster. So a separated WAL provider which is not > > >> tracked by replication is necessary here. > > >> > > >> The data migration will be done automatically during rolling upgrading, > > of > > >> course the migration via a full cluster restart is also supported, but > > >> please make sure you restart master with new code first. The replication > > >> peers will be disabled during the migration and no claiming queue will > > be > > >> scheduled at the same time. So you may see a lot of unfinished SCPs > > during > > >> the migration but do not worry, it will not block the normal failover, > > all > > >> regions will be assigned. The replication peers will be enabled again > > after > > >> the migration is done, no manual operations needed. > > >> > > >> The ReplicationSyncUp tool is also affected. The goal of this tool is to > > >> replicate data to peer cluster while the source cluster is down. But if > > we > > >> store the replication queue data in a hbase table, it is impossible for > > us > > >> to get the newest data if the source cluster is down. So here we choose > > to > > >> read from the region directory directly to load all the replication > > queue > > >> data in memory, and do the sync up work. We may lose the newest data so > > in > > >> this way we need to replicate more data but it will not affect > > >> correctness. > > >> > > > > > > The nightly job is here > > > > > > > > https://ci-hbase.apache.org/job/HBase%20Nightly/job/HBASE-27109%252Ftable_based_rqs/ > > > > > >Mostly fine, the failed UTs are not related and are flaky, for example, > > >build #73, the failed UT is TestAdmin1.testCompactionTimestamps, which is > > >not related to replication and it only failed in jdk11 build but passed in > > >jdk8 build. > > > > > >This is the PR against the master branch. > > > > > >https://github.com/apache/hbase/pull/5202 > > > > > >The PR is big as we have 16 commits on the feature branch. > > > > > >The VOTE will be open for at least 72 hours. > > > > > >[+1] Agree > > >[+0] Neutral > > >[-1] Disagree (please include actionable feedback) > > > > > >Thanks. > >
