+1

At 2023-05-10 01:13:12, "张铎(Duo Zhang)" <[email protected]> wrote:
>The issue is about moving replication queue storage from zookeeper to a
>hbase table. This is the last piece of persistent data on zookeeper. So
>after this feature merged, we are finally fine to say that all data on
>zookeeper can be removed while restarting a cluster.
>
>Let me paste the release note here
>
>We introduced a table based replication queue storage in this issue. The
>> queue data will be stored in hbase:replication table. This is the last
>> piece of persistent data on zookeeper. So after this change, we are OK to
>> clean up all the data on zookeeper, as now they are all transient, a
>> cluster restarting can fix everything.
>>
>> The data structure has been changed a bit as now we only support an offset
>> for a WAL group instead of storing all the WAL files for a WAL group.
>> Please see the replication internals section in our ref guide for more
>> details.
>>
>> To break the cyclic dependency issue, i.e, creating a new WAL writer
>> requires writing to replication queue storage first but with table based
>> replication queue storage, you first need a WAL writer when you want to
>> update to table, now we will not record a queue when creating a new WAL
>> writer instance. The downside for this change is that, the logic for
>> claiming queue and WAL cleaner are much more complicated. See
>> AssignReplicationQueuesProcedure and ReplicationLogCleaner for more details
>> if you have interest.
>>
>> Notice that, we will use a separate WAL provider for hbase:replication
>> table, so you will see a new WAL file for the region server which holds the
>> hbase:replication table. If we do not do this, the update to
>> hbase:replication table will also generate some WAL edits in the WAL file
>> we need to track in replication, and then lead to more updates to
>> hbase:replication table since we have advanced the replication offset. In
>> this way we will generate a lot of garbage in our WAL file, even if we
>> write nothing to the cluster. So a separated WAL provider which is not
>> tracked by replication is necessary here.
>>
>> The data migration will be done automatically during rolling upgrading, of
>> course the migration via a full cluster restart is also supported, but
>> please make sure you restart master with new code first. The replication
>> peers will be disabled during the migration and no claiming queue will be
>> scheduled at the same time. So you may see a lot of unfinished SCPs during
>> the migration but do not worry, it will not block the normal failover, all
>> regions will be assigned. The replication peers will be enabled again after
>> the migration is done, no manual operations needed.
>>
>> The ReplicationSyncUp tool is also affected. The goal of this tool is to
>> replicate data to peer cluster while the source cluster is down. But if we
>> store the replication queue data in a hbase table, it is impossible for us
>> to get the newest data if the source cluster is down. So here we choose to
>> read from the region directory directly to load all the replication queue
>> data in memory, and do the sync up work. We may lose the newest data so in
>> this way we need to replicate more data but it will not affect
>> correctness.
>>
>
> The nightly job is here
>
>https://ci-hbase.apache.org/job/HBase%20Nightly/job/HBASE-27109%252Ftable_based_rqs/
>
>Mostly fine, the failed UTs are not related and are flaky, for example,
>build #73, the failed UT is TestAdmin1.testCompactionTimestamps, which is
>not related to replication and it only failed in jdk11 build but passed in
>jdk8 build.
>
>This is the PR against the master branch.
>
>https://github.com/apache/hbase/pull/5202
>
>The PR is big as we have 16 commits on the feature branch.
>
>The VOTE will be open for at least 72 hours.
>
>[+1] Agree
>[+0] Neutral
>[-1] Disagree (please include actionable feedback)
>
>Thanks.

Reply via email to