Apache9 commented on code in PR #5203: URL: https://github.com/apache/hbase/pull/5203#discussion_r1179173386
########## src/main/asciidoc/_chapters/ops_mgt.adoc: ########## @@ -2433,26 +2433,22 @@ Replication State Storage:: `ReplicationPeerStorage` and `ReplicationQueueStorage`. The former one is for storing the replication peer related states, and the latter one is for storing the replication queue related states. - HBASE-15867 is only half done, as although we have abstract these two interfaces, we still only -have zookeeper based implementations. + And in HBASE-27109, we have implemented the `ReplicationQueueStorage` interface to store the replication queue in the hbase:replication table. Replication State in ZooKeeper:: By default, the state is contained in the base node _/hbase/replication_. - Usually this nodes contains two child nodes, the `peers` znode is for storing replication peer -state, and the `rs` znodes is for storing replication queue state. + After 3.0.0, it only contains one child node, but before 3.0.0, we still use zk to store queue data. The `Peers` Znode:: The `peers` znode is stored in _/hbase/replication/peers_ by default. It consists of a list of all peer replication clusters, along with the status of each of them. The value of each peer is its cluster key, which is provided in the HBase Shell. The cluster key contains a list of ZooKeeper nodes in the cluster's quorum, the client port for the ZooKeeper quorum, and the base znode for HBase in HDFS on that cluster. -The `RS` Znode:: Review Comment: We'd better keep this unchanged, as it describes what we have before 3.0.0. And we can introduce a new section to describe the hbase:replication table storage. ########## src/main/asciidoc/_chapters/ops_mgt.adoc: ########## @@ -2433,26 +2433,22 @@ Replication State Storage:: `ReplicationPeerStorage` and `ReplicationQueueStorage`. The former one is for storing the replication peer related states, and the latter one is for storing the replication queue related states. - HBASE-15867 is only half done, as although we have abstract these two interfaces, we still only -have zookeeper based implementations. + And in HBASE-27109, we have implemented the `ReplicationQueueStorage` interface to store the replication queue in the hbase:replication table. Replication State in ZooKeeper:: By default, the state is contained in the base node _/hbase/replication_. - Usually this nodes contains two child nodes, the `peers` znode is for storing replication peer -state, and the `rs` znodes is for storing replication queue state. + After 3.0.0, it only contains one child node, but before 3.0.0, we still use zk to store queue data. Review Comment: "Usually this nodes contains two child nodes, the `peers` znode is for storing replication peer state, and the `rs` znodes is for storing replication queue state. And if you choose the file system based replication peer storage, you will not see the `peers` znode. And starting from 3.0.0, we have moved the replication queue state to hbase:replication table, so you will not see the `rs` znode." ########## src/main/asciidoc/_chapters/ops_mgt.adoc: ########## @@ -2433,26 +2433,22 @@ Replication State Storage:: `ReplicationPeerStorage` and `ReplicationQueueStorage`. The former one is for storing the replication peer related states, and the latter one is for storing the replication queue related states. - HBASE-15867 is only half done, as although we have abstract these two interfaces, we still only -have zookeeper based implementations. + And in HBASE-27109, we have implemented the `ReplicationQueueStorage` interface to store the replication queue in the hbase:replication table. Review Comment: And in HBASE-27110, we have implemented a file system based replication peer storage, to store replication peer state on file system. Of course you can still use the zookeeper based replication peer storage. And in HBASE-27109, we have changed the replication queue storage from zookeeper based to hbase table based. See the below 'Replication Queue State in hbase:replication table' section for more details. ########## src/main/asciidoc/_chapters/ops_mgt.adoc: ########## @@ -2475,14 +2471,14 @@ When nodes are removed from the slave cluster, or if nodes go down or come back ==== Keeping Track of Logs -Each master cluster region server has its own znode in the replication znodes hierarchy. -It contains one znode per peer cluster (if 5 slave clusters, 5 znodes are created), and each of these contain a queue of WALs to process. +Before 3.0.0, for zookeeper based implementation, it is like a tree, we have a znode for a peer cluster, but under the znode we have lots of files. Review Comment: I think here we'd better make two different sections to describe the logic before and after 3.0.0. As on zookeeper, we store all WAL files on it and for table based solution, we only store an offset. ########## src/main/asciidoc/_chapters/ops_mgt.adoc: ########## @@ -2475,14 +2471,14 @@ When nodes are removed from the slave cluster, or if nodes go down or come back ==== Keeping Track of Logs -Each master cluster region server has its own znode in the replication znodes hierarchy. -It contains one znode per peer cluster (if 5 slave clusters, 5 znodes are created), and each of these contain a queue of WALs to process. +Before 3.0.0, for zookeeper based implementation, it is like a tree, we have a znode for a peer cluster, but under the znode we have lots of files. +But after 3.0.0, for table based implementation, we have server name in row key, which means we will have lots of rows for a given peer. Each of these queues will track the WALs created by that region server, but they can differ in size. For example, if one slave cluster becomes unavailable for some time, the WALs should not be deleted, so they need to stay in the queue while the others are processed. See <<rs.failover.details,rs.failover.details>> for an example. When a source is instantiated, it contains the current WAL that the region server is writing to. -During log rolling, the new file is added to the queue of each slave cluster's znode just before it is made available. +During log rolling, the new file is added to the queue of each slave cluster's record just before it is made available. Review Comment: This is different for table based replication queue storage, and it is the key point here. For zookeeper, it is an external system so there is no problem to let log rolling depend on it, but if we want to store the state in a hbase table, we can not let log rolling depend on it as it will introduce dead lock... We will only write to hbase:replication when want to record an offset after replicating something. ########## src/main/asciidoc/_chapters/ops_mgt.adoc: ########## @@ -2521,93 +2517,36 @@ NOTE: WALs are saved when replication is enabled or disabled as long as peers ex [[rs.failover.details]] ==== Region Server Failover -When no region servers are failing, keeping track of the logs in ZooKeeper adds no value. -Unfortunately, region servers do fail, and since ZooKeeper is highly available, it is useful for managing the transfer of the queues in the event of a failure. - -Each of the master cluster region servers keeps a watcher on every other region server, in order to be notified when one dies (just as the master does). When a failure happens, they all race to create a znode called `lock` inside the dead region server's znode that contains its queues. Review Comment: I think we still need to keep this, as it is the logic for some hbase releases. Let me check the version where we start to use SCP to claim replication queue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
