Thank you Duo. I have also encountered this issue and it is somewhere on the to do list. Let me review the PR, this is fantastic.
On Wed, Mar 30, 2022 at 5:26 PM 张铎(Duo Zhang) <[email protected]> wrote: > Liangjun He from Alibaba has tested the patch on their cloud deployment > rebuilding scenario, and it works fine if we stop masters first and then > region servers. Please check the comments on the jira issue for more > details. > > Let me try to get this in. This will be very useful for users who deploy > HBase on cloud. > > Thanks. > > 张铎(Duo Zhang) <[email protected]> 于2022年3月28日周一 12:24写道: > > > The issue aims to solve the problem of redeploying HBase clusters on > cloud. > > > > I can not find the issue but IIRC, the AWS guys said they tried to do the > > following steps while redeploying a customer's HBase cluster: > > > > 1. Disable write to cluster, flush all data to disk(which is actually S3) > > 2. Recreate the cluster with a set of new machines, and also a new zk and > > a new HDFS(for writing WAL) > > > > Then the new cluster just hung there and no regions were online. > > > > This is because in HMaster startup, we rely on scanning the WAL directory > > on HDFS to get the previous live region servers, and we will compare the > > list with the list stored on zookeeper to find out dead region servers > and > > schedule SCPs for them, and then the SCPs will bring the regions online. > > > > The problem for the above redeploying operation is, the WAL directory is > > also cleaned, so we can not get the previous live region servers, so no > SCP > > will be scheduled. > > > > This is a bit annoying as we have already flushed all the data out so it > > should be safe to delete all the WAL data. > > > > The idea in HBASE-26245 is to also store a copy of the live region > servers > > in master local region, so when restarting, we could also load the > previous > > live region servers from master local region, instead of only relying on > > the WAL directory. In this way we could solve the problem of the above > > redeploying operation. > > > > The PR is also ready. > > > > https://github.com/apache/hbase/pull/4136 > > > > Suggestions and reviews are always welcomed. > > > > Thanks. > > > -- Best regards, Andrew Unrest, ignorance distilled, nihilistic imbeciles - It's what we’ve earned Welcome, apocalypse, what’s taken you so long? Bring us the fitting end that we’ve been counting on - A23, Welcome, Apocalypse
