Re: [DISCUSS] Move replication queue storage from zookeeper to a separated HBase table

Nick Dimiduk Fri, 01 Jul 2022 06:37:17 -0700

Thanks for the update here and meeting minutes.

-n


On Fri, Jul 1, 2022 at 12:46 张铎(Duo Zhang) <[email protected]> wrote:

> Meetings notes:
> Attendees: Duo Zhang, Liangjun He, Xin Sun, Tianhang Tang, Yu Li
>
> First Duo explained the design doc again, then others asked some questions
> and we discussed, let me post the conclusion here:
>
> 1. If we rely on an HBase table to store the replication metadata, then how
> do we use the replication sync up tool to replicate data to the peer
> cluster once the source cluster is fully down?
> We agree that this is a limitation compared to the old zookeeper based
> implementation. Maybe we could mirror the replication metadata to another
> storage system, or use the maintenance mode to bring the hbase:replication
> table online? Not a blocker issue but at least we need to clearly document
> this.
>
> 2. Since we removed the zookeeper usage, the pressure to zookeeper will now
> be moved to HBase and HDFS, will it cause too much pressure and fail the
> cluster under extreme cases?
> After discussion, we almost agree the risk is low. The heaviest operation
> is claim queue, where we need to list HDFS, but it is the last step of SCP,
> where we have already finished WAL splitting, and it will only touch
> namenode, so in general it will not add too much pressure. Anyway, when
> implementing, we need to be careful, to avoid touching HDFS too much.
>
> 3. If hbase:replication is offline, will it hang the replication?
> This is by design, but we should try our best to not hang the normal
> read/write when hbase:replication is offline.
>
> 4. The sourceServerName in ReplicationQueueId means the last region server
> which holds the replication queue?
> No, it is the FIRST region server which holds the replication queue. The
> old design will track all the region servers which hold the replication
> queue in queue id, but actually, we only need the first region server for
> locating the WAL files.
>
> 5. How to predicate the pressure of the new hbase:replication table?
> For a normal cluster, the most pressures come from the update of the
> replication offset. This could be calculated easily with write_throughput /
> replication_size_per_offset_update. Of course, the qps will be doubled if
> the number of the replication peers is doubled.
>
> Later we talked about the general problems for replication, for example, if
> we have 20~30 replication peers, not only the pressure of the replication
> metadata will be a problem, the pressure on reading HDFS will be a big
> problem. We discussed several possible solutions, like only have one thread
> to read WAL files, not thread per peer, cache the newest several WAL files
> in memory, only have one replication peer to mirror all the WAL data to
> kafka, and use kafka to replicate to other systems, etc. Anyway, not
> related to the main topic.
>
> And we all agree that the current design doc is huge and there are still
> lots of details in each area. We will open sub tasks to cover the several
> areas and also split the design doc to several pieces and keep polishing
> it.
>
> Thanks.
>
>
> 张铎(Duo Zhang) <[email protected]> 于2022年6月29日周三 10:23写道：
>
> > We plan to hold an online meeting at 2PM to 3PM, 1st July, GMT +8, using
> > tencent meeting.
> >
> > 阿米朵 邀请您参加腾讯会议
> >> 会议主题：HBase Replication Queue Storage
> >> 会议时间：2022/07/01 14:00-15:00 (GMT+08:00) 中国标准时间 - 北京
> >>
> >> 点击链接入会，或添加至会议列表：(Click this url to join the meeting)
> >> https://meeting.tencent.com/dm/kZQdGasowxXP
> >>
> >> #腾讯会议：430-524-288 <---- This is the number of the meeting
> >> 会议密码：220701 <---- This is the password
> >>
> >> 手机一键拨号入会
> >> +8675536550000,,430524288# (中国大陆)
> >> +85230018898,,,2,430524288# (中国香港)
> >>
> >> 根据您的位置拨号
> >> +8675536550000 (中国大陆)
> >> +85230018898 (中国香港)
> >>
> >> 复制该信息，打开手机腾讯会议即可参与
> >>
> >
> > More attendees are always welcomed :)
> >
> >
> > 张铎(Duo Zhang) <[email protected]> 于2022年6月21日周二 12:46写道：
> >
> >> Liangjun He replied on jira that he wants to join the work.
> >>
> >> We plan to schedule an online meeting recently to discuss it.
> >>
> >> Will post the meeting schedule here when we find a suitable time.
> >>
> >> Feel free to join if you are interested.
> >>
> >> Thanks.
> >>
> >> 张铎(Duo Zhang) <[email protected]> 于2022年6月16日周四 22:07写道：
> >>
> >>> Thanks Andrew for the hard work on closing stale issues and let me bump
> >>> this thread...
> >>>
> >>> 张铎(Duo Zhang) <[email protected]> 于2022年6月12日周日 21:25写道：
> >>>
> >>>> The issue for this is HBASE-27109[1], and it is a sub task for
> >>>> HBASE-15867[2], where we want to remove the dependency on zk for
> >>>> replication implementation. If HBASE-15867 is done, there is no
> permanent
> >>>> state on zk any more, which means we are always safe to rebuild a
> cluster
> >>>> with a fresh zk instance.
> >>>>
> >>>> The related issues have been opened long ago, such
> >>>> as HBASE-10295[3], HBASE-13773[4], etc. HBASE-15867 nearly solved the
> >>>> problem as we have already abstract a replication peer storage
> interface
> >>>> and also a replication queue storage interface, the idea is to have
> two
> >>>> table based storages then we can solve the problem. But then we find
> out
> >>>> there is still a cyclic dependency which could fail the startup of a
> >>>> cluster. In the current replication implementation, once we create a
> new
> >>>> WAL writer, we need to record it in the replication queue storage,
> before
> >>>> writing data to it. But if we move the replication queue storage to a
> hbase
> >>>> table, then we need this table to be writable first, then we can
> record the
> >>>> new WAL file in it. On a new cluster, this will hang the cluster
> start up
> >>>> as besides hbase:meta, no region can be online...
> >>>>
> >>>> In HBASE-27109, I proposed a new way to track the WAL files. Please
> see
> >>>> the design doc[5] for more details. You may find out that the
> >>>> implementation of claim queues and replication log cleaner become more
> >>>> complicated. This is a trade off, if we want to make the life when
> writing
> >>>> and tracking WAL easier, then we need to deal with the complexity in
> other
> >>>> places. But I think it is worthwhile as writing WAL is on the
> critical path
> >>>> of our main read/write flow, where claim queues and replication log
> cleaner
> >>>> are both background tasks.
> >>>>
> >>>> Feel free to reply here, on the jira issue, or on the design doc.
> >>>> Suggestions are always welcomed.
> >>>>
> >>>> 1. https://issues.apache.org/jira/browse/HBASE-27109
> >>>> 2. https://issues.apache.org/jira/browse/HBASE-15867
> >>>> 3. https://issues.apache.org/jira/browse/HBASE-10295
> >>>> 4. https://issues.apache.org/jira/browse/HBASE-13773
> >>>> 5.
> >>>>
> https://docs.google.com/document/d/1QrSFlDQblxc12aTomE64sVmghrs_g5ys4fU9wGOdMHk/edit?usp=sharing
> >>>>
> >>>
>

Re: [DISCUSS] Move replication queue storage from zookeeper to a separated HBase table

Reply via email to