[ 
https://issues.apache.org/jira/browse/HBASE-23269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987959#comment-16987959
 ] 

HBase QA commented on HBASE-23269:
----------------------------------

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-1.4 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
24s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
18s{color} | {color:green} branch-1.4 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} branch-1.4 passed with JDK v1.8.0_232 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green} branch-1.4 passed with JDK v1.7.0_242 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 1s{color} | {color:green} branch-1.4 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
51s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} branch-1.4 passed with JDK v1.8.0_232 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} branch-1.4 passed with JDK v1.7.0_242 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m 
42s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
33s{color} | {color:green} branch-1.4 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed with JDK v1.8.0_232 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed with JDK v1.7.0_242 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
44s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
2m 34s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.7. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed with JDK v1.8.0_232 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed with JDK v1.7.0_242 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
2s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}113m 
44s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m  
8s{color} | {color:green} hbase-rsgroup in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
53s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}166m 37s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 base: 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-844/7/artifact/out/Dockerfile
 |
| GITHUB PR | https://github.com/apache/hbase/pull/844 |
| JIRA Issue | HBASE-23269 |
| Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs 
shadedjars hadoopcheck hbaseanti checkstyle compile |
| uname | Linux 096d8a8fcb3e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/HBase-PreCommit-GitHub-PR_PR-844/out/precommit/personality/provided.sh
 |
| git revision | branch-1.4 / 75e044f |
| Default Java | 1.7.0_242 |
| Multi-JDK versions | /usr/lib/jvm/zulu-8-amd64:1.8.0_232 
/usr/lib/jvm/zulu-7-amd64:1.7.0_242 |
|  Test Results | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-844/7/testReport/
 |
| Max. process+thread count | 3934 (vs. ulimit of 10000) |
| modules | C: hbase-server hbase-rsgroup U: . |
| Console output | 
https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-844/7/console |
| versions | git=1.9.1 maven=3.0.5 findbugs=3.0.1 |
| Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |


This message was automatically generated.



> Hbase crashed due to two versions of regionservers when rolling upgrading
> -------------------------------------------------------------------------
>
>                 Key: HBASE-23269
>                 URL: https://issues.apache.org/jira/browse/HBASE-23269
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 1.4.0, 1.4.2, 1.4.9, 1.4.10, 1.4.11
>            Reporter: Jianzhen Xu
>            Assignee: Jianzhen Xu
>            Priority: Critical
>         Attachments: 9.png, image-2019-11-07-14-49-41-253.png, 
> image-2019-11-07-14-50-11-877.png, image-2019-11-07-14-51-38-858.png
>
>
> Currently, when hbase turns on the rs_group function and needs to upgrade to 
> a higher version, the meta table maybe assign failed, which eventually makes 
> the whole cluster unavailable and the availability drops to 0.This applies to 
> all versions that introduce rs_group functionality in hbase-1.4.*. Including 
> the patch of rs_group is introduced in the version below 1.4, upgrade to 
> version 1.4 will also appear.
>  When this happens during an upgrade:
>  * When rolling upgrading regionservers, it must appear if the first rs of 
> the upgrade is not in the same rs_group as the meta table.
>  The phenomenon is as follows:
> !image-2019-11-07-14-50-11-877.png!
> !image-2019-11-07-14-51-38-858.png!
> The reason for this is as follows: during a rolling upgrade of the first 
> regionserver node (denoted as RS1),RS1 started up and re-registered to 
> zk,master triggered the operation through watcher perception in 
> RegionServerTracker, and finally came to this 
> method-HMaster.checkIfShouldMoveSystemRegionAsync()。
> The logic of this method is as follows:
>  
> {code:java}
> // code placeholder
> public void checkIfShouldMoveSystemRegionAsync() {
>   new Thread(new Runnable() {
>     @Override
>     public void run() {
>       try {
>         synchronized (checkIfShouldMoveSystemRegionLock) {
>           // RS register on ZK after reports startup on master
>           List<HRegionInfo> regionsShouldMove = new ArrayList<>();
>           for (ServerName server : getExcludedServersForSystemTable()) {
>             regionsShouldMove.addAll(getCarryingSystemTables(server));
>           }
>           if (!regionsShouldMove.isEmpty()) {
>             List<RegionPlan> plans = new ArrayList<>();
>             for (HRegionInfo regionInfo : regionsShouldMove) {
>               RegionPlan plan = getRegionPlan(regionInfo, true);
>               if (regionInfo.isMetaRegion()) {
>                 // Must move meta region first.
>                 balance(plan);
>               } else {
>                 plans.add(plan);
>               }
>             }
>             for (RegionPlan plan : plans) {
>               balance(plan);
>             }
>           }
>         }
>       } catch (Throwable t) {
>         LOG.error(t);
>       }
>     }
>   }).start();
> }{code}
>  
>  # First execute getExcludedServersForSystemTable():Get the highest version 
> value in all regionservers and return all RSs below that version value, 
> labeled LowVersionRSList
>  # If 1 does not return null, iterate.If there is a region with system table 
> on rs, add this region to the List that needs move.If the first rs upgraded 
> at this point is not in the rs_group where the system table is located, the 
> region of the meta table is added to regionsShouldMove
>  # Get a Regionplan for the region in regionsShouldMove,, and the parameter 
> forceNewPlan is true:
>  ## Gets all regionserver which version is below the highest version;
>  ##  Exclude regionservers from 1) for all rs online status. The result is 
> that only the rs has been upgraded will in collection, marked as destServers ;
>  ## Since forceNewPlan is set to true, destination server will be obtained 
> through balance.randomassignmet (region, destServers). Since rs_group 
> function is enabled, the balance here is RSGroupBasedLoadBalancer.The logic 
> in this method is:
>  ### the destServers in 3.2 obtained intersect with all online regionservers 
> in the rs_group of the current region.When region is a system table and not 
> in the same rs_group, the result here is null.If null is returned, 
> destination regionserver is hard-coded as BOGUS_SERVER_NAME(localhost,1);
> Therefore, when master assigns region of the system table to localhost,1, it 
> will naturally assign failed.If the above master logic is not noticed and 
> this problem occurs, you can randomly upgrade a node in the rs_group where 
> the system table is located, and it will automatically recover.
> During the actual upgrade process, you will rarely know this problem without 
> looking at the master code.However, the official document does not indicate 
> that when using the rs_group function, the rs_group where the system table is 
> located needs to be upgraded first. It is easy to get into this process and 
> eventually crash.The system tables are assigned to the highest version of rs 
> for compatibility purposes, the comment says.
> Therefore, without changing the code logic, it can be noted in the official 
> documentation that the rs_group of the system table is the priority to be 
> upgraded when the cluster is upgraded with the rs_group function.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to