[ 
https://issues.apache.org/jira/browse/HBASE-23269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianzhen Xu updated HBASE-23269:
--------------------------------
    Description: 
Currently, when hbase turns on the rs_group function and needs to upgrade to a 
higher version, the meta table maybe assign failed, which eventually makes the 
whole cluster unavailable and the availability drops to 0.This applies to all 
versions that introduce rs_group functionality.Including the patch of rs_group 
is introduced in the version below 1.4, upgrade to version 1.4+ will also 
appear.
 When this happens during an upgrade:
 * When rolling upgrading regionservers, it must appear if the first rs of the 
upgrade is not in the same rs_group as the meta table.
 The phenomenon is as follows:

!image-2019-11-07-14-50-11-877.png!

!image-2019-11-07-14-51-38-858.png!

The reason for this is as follows: during a rolling upgrade of the first 
regionserver node (denoted as RS1),RS1 started up and re-registered to 
zk,master triggered the operation through watcher perception in 
RegionServerTracker, and finally came to this 
method-HMaster.checkIfShouldMoveSystemRegionAsync()。

The logic of this method is as follows:

 
{code:java}
// code placeholder
public void checkIfShouldMoveSystemRegionAsync() {
  new Thread(new Runnable() {
    @Override
    public void run() {
      try {
        synchronized (checkIfShouldMoveSystemRegionLock) {
          // RS register on ZK after reports startup on master
          List<HRegionInfo> regionsShouldMove = new ArrayList<>();
          for (ServerName server : getExcludedServersForSystemTable()) {
            regionsShouldMove.addAll(getCarryingSystemTables(server));
          }
          if (!regionsShouldMove.isEmpty()) {
            List<RegionPlan> plans = new ArrayList<>();
            for (HRegionInfo regionInfo : regionsShouldMove) {
              RegionPlan plan = getRegionPlan(regionInfo, true);
              if (regionInfo.isMetaRegion()) {
                // Must move meta region first.
                balance(plan);
              } else {
                plans.add(plan);
              }
            }
            for (RegionPlan plan : plans) {
              balance(plan);
            }
          }
        }
      } catch (Throwable t) {
        LOG.error(t);
      }
    }
  }).start();
}{code}
 
 # First execute getExcludedServersForSystemTable():Get the highest version 
value in all regionservers and return all RSs below that version value, labeled 
LowVersionRSList
 # If 1 does not return null, iterate.If there is a region with system table on 
rs, add this region to the List that needs move.If the first rs upgraded at 
this point is not in the rs_group where the system table is located, the region 
of the meta table is added to regionsShouldMove
 # Get a Regionplan for the region in regionsShouldMove,, and the parameter 
forceNewPlan is true:
 ## Gets all regionserver which version is below the highest version;
 ##  Exclude regionservers from 1) for all rs online status. The result is that 
only the rs has been upgraded will in collection, marked as destServers ;
 ## Since forceNewPlan is set to true, destination server will be obtained 
through balance.randomassignmet (region, destServers). Since rs_group function 
is enabled, the balance here is RSGroupBasedLoadBalancer.The logic in this 
method is:
 ### the destServers in 3.2 obtained intersect with all online regionservers in 
the rs_group of the current region.When region is a system table and not in the 
same rs_group, the result here is null.If null is returned, destination 
regionserver is hard-coded as BOGUS_SERVER_NAME(localhost,1);

Therefore, when master assigns region of the system table to localhost,1, it 
will naturally assign failed.If the above master logic is not noticed and this 
problem occurs, you can randomly upgrade a node in the rs_group where the 
system table is located, and it will automatically recover.

During the actual upgrade process, you will rarely know this problem without 
looking at the master code.However, the official document does not indicate 
that when using the rs_group function, the rs_group where the system table is 
located needs to be upgraded first. It is easy to get into this process and 
eventually crash.The system tables are assigned to the highest version of rs 
for compatibility purposes, the comment says.T

herefore, without changing the code logic, it can be noted in the official 
documentation that the rs_group of the system table is the priority to be 
upgraded when the cluster is upgraded with the rs_group function.

 

 

 

 

  was:
Currently, when hbase turns on the rs_group function and needs to upgrade to a 
higher version, the meta table maybe assign failed, which eventually makes the 
whole cluster unavailable and the availability drops to 0.This applies to all 
versions that introduce rs_group functionality.Including the patch of rs_group 
is introduced in the version below 1.4, upgrade to version 1.4+ will also 
appear.
When this happens during an upgrade:
 * When rolling upgrading regionservers, it must appear if the first rs of the 
upgrade is not in the same rs_group as the meta table.
The phenomenon is as follows:


!image-2019-11-07-14-50-11-877.png!

!image-2019-11-07-14-51-38-858.png!

The reason for this is as follows: during a rolling upgrade of the first 
regionserver node (denoted as RS1),RS1 started up and re-registered to 
zk,master triggered the operation through watcher perception in 
RegionServerTracker, and finally came to this 
method-HMaster.checkIfShouldMoveSystemRegionAsync()。

The logic of this method is as follows:

 
{code:java}
// code placeholder
public void checkIfShouldMoveSystemRegionAsync() {
  new Thread(new Runnable() {
    @Override
    public void run() {
      try {
        synchronized (checkIfShouldMoveSystemRegionLock) {
          // RS register on ZK after reports startup on master
          List<HRegionInfo> regionsShouldMove = new ArrayList<>();
          for (ServerName server : getExcludedServersForSystemTable()) {
            regionsShouldMove.addAll(getCarryingSystemTables(server));
          }
          if (!regionsShouldMove.isEmpty()) {
            List<RegionPlan> plans = new ArrayList<>();
            for (HRegionInfo regionInfo : regionsShouldMove) {
              RegionPlan plan = getRegionPlan(regionInfo, true);
              if (regionInfo.isMetaRegion()) {
                // Must move meta region first.
                balance(plan);
              } else {
                plans.add(plan);
              }
            }
            for (RegionPlan plan : plans) {
              balance(plan);
            }
          }
        }
      } catch (Throwable t) {
        LOG.error(t);
      }
    }
  }).start();
}{code}
 
 # First execute getExcludedServersForSystemTable():Get the highest version 
value in all regionservers and return all RSs below that version value, labeled 
LowVersionRSList
 # If 1 does not return null, iterate.If there is a region with system table on 
rs, add this region to the List that needs move.If the first rs upgraded at 
this point is not in the rs_group where the system table is located, the region 
of the meta table is added to regionsShouldMove
 # Get a Regionplan for the region in regionsShouldMove,, and the parameter 
forceNewPlan is true:
 ## Gets all regionserver which version is below the highest version;

 ##  

Exclude regionservers from 1) for all rs online status. The result is that only 
the rs has been upgraded will in collection, marked as destServers ;

 ##  

Since forceNewPlan is set to true, destination server will be obtained through 
balance.randomassignmet (region, destServers). Since rs_group function is 
enabled, the balance here is RSGroupBasedLoadBalancer.The logic in this method 
is:

(1).the destServers in 3.2 obtained intersect with all online regionservers in 
the rs_group of the current region.When region is a system table and not in the 
same rs_group, the result here is null.If null is returned, destination 
regionserver is hard-coded as BOGUS_SERVER_NAME(localhost,1);

Therefore, when master assigns region of the system table to localhost,1, it 
will naturally assign failed.If the above master logic is not noticed and this 
problem occurs, you can randomly upgrade a node in the rs_group where the 
system table is located, and it will automatically recover.

During the actual upgrade process, you will rarely know this problem without 
looking at the master code.However, the official document does not indicate 
that when using the rs_group function, the rs_group where the system table is 
located needs to be upgraded first. It is easy to get into this process and 
eventually crash.The system tables are assigned to the highest version of rs 
for compatibility purposes, the comment says.T

herefore, without changing the code logic, it can be noted in the official 
documentation that the rs_group of the system table is the priority to be 
upgraded when the cluster is upgraded with the rs_group function.

 

 

 

 


> Hbase crashed due to two versions of regionservers when rolling upgrading
> -------------------------------------------------------------------------
>
>                 Key: HBASE-23269
>                 URL: https://issues.apache.org/jira/browse/HBASE-23269
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>            Reporter: Jianzhen Xu
>            Priority: Critical
>         Attachments: 9.png, image-2019-11-07-14-49-41-253.png, 
> image-2019-11-07-14-50-11-877.png, image-2019-11-07-14-51-38-858.png
>
>
> Currently, when hbase turns on the rs_group function and needs to upgrade to 
> a higher version, the meta table maybe assign failed, which eventually makes 
> the whole cluster unavailable and the availability drops to 0.This applies to 
> all versions that introduce rs_group functionality.Including the patch of 
> rs_group is introduced in the version below 1.4, upgrade to version 1.4+ will 
> also appear.
>  When this happens during an upgrade:
>  * When rolling upgrading regionservers, it must appear if the first rs of 
> the upgrade is not in the same rs_group as the meta table.
>  The phenomenon is as follows:
> !image-2019-11-07-14-50-11-877.png!
> !image-2019-11-07-14-51-38-858.png!
> The reason for this is as follows: during a rolling upgrade of the first 
> regionserver node (denoted as RS1),RS1 started up and re-registered to 
> zk,master triggered the operation through watcher perception in 
> RegionServerTracker, and finally came to this 
> method-HMaster.checkIfShouldMoveSystemRegionAsync()。
> The logic of this method is as follows:
>  
> {code:java}
> // code placeholder
> public void checkIfShouldMoveSystemRegionAsync() {
>   new Thread(new Runnable() {
>     @Override
>     public void run() {
>       try {
>         synchronized (checkIfShouldMoveSystemRegionLock) {
>           // RS register on ZK after reports startup on master
>           List<HRegionInfo> regionsShouldMove = new ArrayList<>();
>           for (ServerName server : getExcludedServersForSystemTable()) {
>             regionsShouldMove.addAll(getCarryingSystemTables(server));
>           }
>           if (!regionsShouldMove.isEmpty()) {
>             List<RegionPlan> plans = new ArrayList<>();
>             for (HRegionInfo regionInfo : regionsShouldMove) {
>               RegionPlan plan = getRegionPlan(regionInfo, true);
>               if (regionInfo.isMetaRegion()) {
>                 // Must move meta region first.
>                 balance(plan);
>               } else {
>                 plans.add(plan);
>               }
>             }
>             for (RegionPlan plan : plans) {
>               balance(plan);
>             }
>           }
>         }
>       } catch (Throwable t) {
>         LOG.error(t);
>       }
>     }
>   }).start();
> }{code}
>  
>  # First execute getExcludedServersForSystemTable():Get the highest version 
> value in all regionservers and return all RSs below that version value, 
> labeled LowVersionRSList
>  # If 1 does not return null, iterate.If there is a region with system table 
> on rs, add this region to the List that needs move.If the first rs upgraded 
> at this point is not in the rs_group where the system table is located, the 
> region of the meta table is added to regionsShouldMove
>  # Get a Regionplan for the region in regionsShouldMove,, and the parameter 
> forceNewPlan is true:
>  ## Gets all regionserver which version is below the highest version;
>  ##  Exclude regionservers from 1) for all rs online status. The result is 
> that only the rs has been upgraded will in collection, marked as destServers ;
>  ## Since forceNewPlan is set to true, destination server will be obtained 
> through balance.randomassignmet (region, destServers). Since rs_group 
> function is enabled, the balance here is RSGroupBasedLoadBalancer.The logic 
> in this method is:
>  ### the destServers in 3.2 obtained intersect with all online regionservers 
> in the rs_group of the current region.When region is a system table and not 
> in the same rs_group, the result here is null.If null is returned, 
> destination regionserver is hard-coded as BOGUS_SERVER_NAME(localhost,1);
> Therefore, when master assigns region of the system table to localhost,1, it 
> will naturally assign failed.If the above master logic is not noticed and 
> this problem occurs, you can randomly upgrade a node in the rs_group where 
> the system table is located, and it will automatically recover.
> During the actual upgrade process, you will rarely know this problem without 
> looking at the master code.However, the official document does not indicate 
> that when using the rs_group function, the rs_group where the system table is 
> located needs to be upgraded first. It is easy to get into this process and 
> eventually crash.The system tables are assigned to the highest version of rs 
> for compatibility purposes, the comment says.T
> herefore, without changing the code logic, it can be noted in the official 
> documentation that the rs_group of the system table is the priority to be 
> upgraded when the cluster is upgraded with the rs_group function.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to