[
https://issues.apache.org/jira/browse/HBASE-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774429#comment-16774429
]
Hadoop QA commented on HBASE-14498:
-----------------------------------
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m
0s{color} | {color:green} The patch appears to include 1 new or modified test
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m
22s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m
34s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m
9s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m
57s{color} | {color:green} branch has no errors when building our shaded
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m
50s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
42s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m
0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m
58s{color} | {color:green} patch has no errors when building our shaded
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}
8m 34s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
39s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m
44s{color} | {color:green} hbase-zookeeper in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}139m
51s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m
48s{color} | {color:green} The patch does not generate ASF License warnings.
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}183m 49s{color} |
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-14498 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/12959638/HBASE-14498.009.patch
|
| Optional Tests | dupname asflicense javac javadoc unit findbugs
shadedjars hadoopcheck hbaseanti checkstyle compile |
| uname | Linux c9f669f771e8 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2
17:16:02 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality |
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
|
| git revision | master / 9a55cbb2c1 |
| maven | version: Apache Maven 3.5.4
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| Test Results |
https://builds.apache.org/job/PreCommit-HBASE-Build/16078/testReport/ |
| Max. process+thread count | 4826 (vs. ulimit of 10000) |
| modules | C: hbase-zookeeper hbase-server U: . |
| Console output |
https://builds.apache.org/job/PreCommit-HBASE-Build/16078/console |
| Powered by | Apache Yetus 0.8.0 http://yetus.apache.org |
This message was automatically generated.
> Master stuck in infinite loop when all Zookeeper servers are unreachable (and
> RS may run after losing its znode)
> ----------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-14498
> URL: https://issues.apache.org/jira/browse/HBASE-14498
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 3.0.0, 1.5.0, 2.0.0, 2.2.0
> Reporter: Y. SREENIVASULU REDDY
> Assignee: Pankaj Kumar
> Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HBASE-14498-V2.patch, HBASE-14498-V3.patch,
> HBASE-14498-V4.patch, HBASE-14498-V5.patch, HBASE-14498-V6.patch,
> HBASE-14498-V6.patch, HBASE-14498-addendum.patch,
> HBASE-14498-branch-1.2.patch, HBASE-14498-branch-1.3-V2.patch,
> HBASE-14498-branch-1.3.patch, HBASE-14498-branch-1.4.patch,
> HBASE-14498-branch-1.patch, HBASE-14498.007.patch, HBASE-14498.008.patch,
> HBASE-14498.009.patch, HBASE-14498.009.patch, HBASE-14498.master.001.patch,
> HBASE-14498.master.002.patch, HBASE-14498.patch
>
>
> We met a weird scenario in our production environment.
> In a HA cluster,
> > Active Master (HM1) is not able to connect to any Zookeeper server (due to
> > N/w breakdown on master machine network with Zookeeper servers).
> {code}
> 2015-09-26 15:24:47,508 INFO
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host:2181)]
> zookeeper.ClientCnxn: Client session timed out, have not heard from server in
> 33463ms for sessionid 0x104576b8dda0002, closing socket connection and
> attempting reconnect
> 2015-09-26 15:24:47,877 INFO
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)]
> client.FourLetterWordMain: connecting to ZK-Host1 2181
> 2015-09-26 15:24:48,236 INFO [main-SendThread(ZK-Host1:2181)]
> client.FourLetterWordMain: connecting to ZK-Host1 2181
> 2015-09-26 15:24:49,879 WARN
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)]
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host1
> 2015-09-26 15:24:49,879 INFO
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)]
> zookeeper.ClientCnxn: Opening socket connection to server
> ZK-Host1/ZK-IP1:2181. Will not attempt to authenticate using SASL (unknown
> error)
> 2015-09-26 15:24:50,238 WARN [main-SendThread(ZK-Host1:2181)]
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host1
> 2015-09-26 15:24:50,238 INFO [main-SendThread(ZK-Host1:2181)]
> zookeeper.ClientCnxn: Opening socket connection to server
> ZK-Host1/ZK-Host1:2181. Will not attempt to authenticate using SASL (unknown
> error)
> 2015-09-26 15:25:17,470 INFO [main-SendThread(ZK-Host1:2181)]
> zookeeper.ClientCnxn: Client session timed out, have not heard from server in
> 30023ms for sessionid 0x2045762cc710006, closing socket connection and
> attempting reconnect
> 2015-09-26 15:25:17,571 WARN [master/HM1-Host/HM1-IP:16000]
> zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper,
> quorum=ZK-Host:2181,ZK-Host1:2181,ZK-Host2:2181,
> exception=org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /hbase/master
> 2015-09-26 15:25:17,872 INFO [main-SendThread(ZK-Host:2181)]
> client.FourLetterWordMain: connecting to ZK-Host 2181
> 2015-09-26 15:25:19,874 WARN [main-SendThread(ZK-Host:2181)]
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host
> 2015-09-26 15:25:19,874 INFO [main-SendThread(ZK-Host:2181)]
> zookeeper.ClientCnxn: Opening socket connection to server ZK-Host/ZK-IP:2181.
> Will not attempt to authenticate using SASL (unknown error)
> {code}
> > Since HM1 was not able to connect to any ZK, so session timeout didnt
> > happen at Zookeeper server side and HM1 didnt abort.
> > On Zookeeper session timeout standby master (HM2) registered himself as an
> > active master.
> > HM2 is keep on waiting for region server to report him as part of active
> > master intialization.
> {noformat}
> 2015-09-26 15:24:44,928 | INFO | HM2-Host:21300.activeMasterManager | Waiting
> for region servers count to settle; currently checked in 0, slept for 0 ms,
> expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval
> of 1500 ms. |
> org.apache.hadoop.hbase.master.ServerManager.waitForRegionServers(ServerManager.java:1011)
> ---
> ---
> 2015-09-26 15:32:50,841 | INFO | HM2-Host:21300.activeMasterManager | Waiting
> for region servers count to settle; currently checked in 0, slept for 483913
> ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms,
> interval of 1500 ms. |
> org.apache.hadoop.hbase.master.ServerManager.waitForRegionServers(ServerManager.java:1011)
> {noformat}
> > At other end, region servers are reporting to HM1 on 3 sec interval. Here
> > region server retrieve master location from zookeeper only when they
> > couldn't connect to Master (ServiceException).
> Region Server will not report HM2 as per current design until unless HM1
> abort,so HM2 will exit(InitializationMonitor) and again wait for region
> servers in loop.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)