[ 
https://issues.apache.org/jira/browse/HBASE-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16112167#comment-16112167
 ] 

Hadoop QA commented on HBASE-14498:
-----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
38s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
45s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
49s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
26s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
30m 45s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
53s{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}114m 31s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
42s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}169m 29s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Timed out junit tests | org.apache.hadoop.hbase.coprocessor.TestHTableWrapper 
|
|   | org.apache.hadoop.hbase.mapreduce.TestHRegionPartitioner |
|   | org.apache.hadoop.hbase.snapshot.TestSnapshotClientRetries |
|   | org.apache.hadoop.hbase.master.TestMasterFailover |
|   | org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancer2 |
|   | org.apache.hadoop.hbase.TestFullLogReconstruction |
|   | org.apache.hadoop.hbase.coprocessor.TestRegionObserverScannerOpenHook |
|   | org.apache.hadoop.hbase.master.TestAssignmentManagerMetrics |
|   | org.apache.hadoop.hbase.wal.TestWALFiltering |
|   | org.apache.hadoop.hbase.quotas.TestQuotaStatusRPCs |
|   | org.apache.hadoop.hbase.TestMultiVersions |
|   | org.apache.hadoop.hbase.coprocessor.TestCoprocessorMetrics |
|   | org.apache.hadoop.hbase.quotas.TestMasterSpaceQuotaObserver |
|   | org.apache.hadoop.hbase.mapreduce.TestCellCounter |
|   | org.apache.hadoop.hbase.TestZooKeeper |
|   | org.apache.hadoop.hbase.wal.TestWALSplitCompressed |
|   | org.apache.hadoop.hbase.filter.TestScanRowPrefix |
|   | org.apache.hadoop.hbase.quotas.TestRegionSizeUse |
|   | org.apache.hadoop.hbase.coprocessor.TestMasterObserver |
|   | org.apache.hadoop.hbase.util.TestHBaseFsckEncryption |
|   | org.apache.hadoop.hbase.coprocessor.TestWALObserver |
|   | org.apache.hadoop.hbase.master.TestMasterWalManager |
|   | org.apache.hadoop.hbase.coprocessor.TestIncrementTimeRange |
|   | org.apache.hadoop.hbase.master.TestSplitLogManager |
|   | org.apache.hadoop.hbase.master.TestRollingRestart |
|   | org.apache.hadoop.hbase.util.TestMiniClusterLoadEncoded |
|   | org.apache.hadoop.hbase.quotas.TestSpaceQuotas |
|   | org.apache.hadoop.hbase.master.TestDistributedLogSplitting |
|   | org.apache.hadoop.hbase.util.TestMiniClusterLoadSequential |
|   | org.apache.hadoop.hbase.quotas.TestQuotaObserverChoreWithMiniCluster |
|   | 
org.apache.hadoop.hbase.coprocessor.TestRegionObserverForAddingMutationsFromCoprocessors
 |
|   | org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat2 |
|   | org.apache.hadoop.hbase.util.TestFromClientSide3WoUnsafe |
|   | org.apache.hadoop.hbase.filter.TestFilterListOrOperatorWithBlkCnt |
|   | org.apache.hadoop.hbase.TestNamespace |
|   | org.apache.hadoop.hbase.TestMovedRegionsCleaner |
|   | org.apache.hadoop.hbase.coprocessor.TestOpenTableInCoprocessor |
|   | org.apache.hadoop.hbase.mapreduce.TestHashTable |
|   | org.apache.hadoop.hbase.mapreduce.TestCopyTable |
|   | org.apache.hadoop.hbase.mapreduce.TestSyncTable |
|   | org.apache.hadoop.hbase.quotas.TestSpaceQuotasWithSnapshots |
|   | org.apache.hadoop.hbase.filter.TestMultiRowRangeFilter |
|   | org.apache.hadoop.hbase.TestHBaseOnOtherDfsCluster |
|   | org.apache.hadoop.hbase.util.TestRegionMover |
|   | org.apache.hadoop.hbase.master.snapshot.TestSnapshotFileCache |
|   | org.apache.hadoop.hbase.quotas.TestQuotaObserverChoreRegionReports |
|   | org.apache.hadoop.hbase.snapshot.TestMobRestoreFlushSnapshotFromClient |
|   | org.apache.hadoop.hbase.master.TestRestartCluster |
|   | org.apache.hadoop.hbase.util.TestMiniClusterLoadParallel |
|   | org.apache.hadoop.hbase.snapshot.TestRestoreFlushSnapshotFromClient |
|   | org.apache.hadoop.hbase.master.TestWarmupRegion |
|   | org.apache.hadoop.hbase.TestLocalHBaseCluster |
|   | 
org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithRemove
 |
|   | org.apache.hadoop.hbase.snapshot.TestMobFlushSnapshotFromClient |
|   | org.apache.hadoop.hbase.quotas.TestSuperUserQuotaPermissions |
|   | org.apache.hadoop.hbase.coprocessor.TestRegionObserverBypass |
|   | org.apache.hadoop.hbase.TestServerSideScanMetricsFromClientSide |
|   | org.apache.hadoop.hbase.snapshot.TestFlushSnapshotFromClient |
|   | org.apache.hadoop.hbase.TestMetaTableAccessor |
|   | 
org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort
 |
|   | org.apache.hadoop.hbase.master.TestRegionPlacement2 |
|   | org.apache.hadoop.hbase.util.TestConnectionCache |
|   | 
org.apache.hadoop.hbase.coprocessor.TestMasterCoprocessorExceptionWithRemove |
|   | org.apache.hadoop.hbase.wal.TestWALFactory |
|   | org.apache.hadoop.hbase.wal.TestBoundedRegionGroupingStrategy |
|   | org.apache.hadoop.hbase.util.TestIdReadWriteLock |
|   | org.apache.hadoop.hbase.filter.TestFuzzyRowFilterEndToEnd |
|   | org.apache.hadoop.hbase.coprocessor.TestCoprocessorStop |
|   | org.apache.hadoop.hbase.mapreduce.TestTableSnapshotInputFormat |
|   | org.apache.hadoop.hbase.TestPartialResultsFromClientSide |
|   | 
org.apache.hadoop.hbase.coprocessor.TestMasterCoprocessorExceptionWithAbort |
|   | org.apache.hadoop.hbase.tool.TestCanaryTool |
|   | org.apache.hadoop.hbase.master.TestMasterOperationsForRegionReplicas |
|   | org.apache.hadoop.hbase.util.TestFSUtils |
|   | org.apache.hadoop.hbase.wal.TestWALSplit |
|   | org.apache.hadoop.hbase.mapreduce.TestImportTsv |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=1.13.1 Server=1.13.1 Image:yetus/hbase:bdc94b1 |
| JIRA Issue | HBASE-14498 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12880120/HBASE-14498.master.001.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux d4b74e695cbf 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 
11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 504a1f1 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC3 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7899/artifact/patchprocess/whitespace-eol.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7899/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7899/testReport/ |
| modules | C: hbase-client hbase-server U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/7899/console |
| Powered by | Apache Yetus 0.4.0   http://yetus.apache.org |


This message was automatically generated.



> Master stuck in infinite loop when all Zookeeper servers are unreachable
> ------------------------------------------------------------------------
>
>                 Key: HBASE-14498
>                 URL: https://issues.apache.org/jira/browse/HBASE-14498
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>            Reporter: Y. SREENIVASULU REDDY
>            Assignee: Pankaj Kumar
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: HBASE-14498.master.001.patch, HBASE-14498.patch, 
> HBASE-14498-V2.patch, HBASE-14498-V3.patch, HBASE-14498-V4.patch, 
> HBASE-14498-V5.patch, HBASE-14498-V6.patch, HBASE-14498-V6.patch
>
>
> We met a weird scenario in our production environment.
> In a HA cluster,
> > Active Master (HM1) is not able to connect to any Zookeeper server (due to 
> > N/w breakdown on master machine network with Zookeeper servers).
> {code}
> 2015-09-26 15:24:47,508 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Client session timed out, have not heard from server in 
> 33463ms for sessionid 0x104576b8dda0002, closing socket connection and 
> attempting reconnect
> 2015-09-26 15:24:47,877 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host1 2181
> 2015-09-26 15:24:48,236 INFO [main-SendThread(ZK-Host1:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host1 2181
> 2015-09-26 15:24:49,879 WARN 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host1
> 2015-09-26 15:24:49,879 INFO 
> [HM1-Host:16000.activeMasterManager-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> ZK-Host1/ZK-IP1:2181. Will not attempt to authenticate using SASL (unknown 
> error)
> 2015-09-26 15:24:50,238 WARN [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host1
> 2015-09-26 15:24:50,238 INFO [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server 
> ZK-Host1/ZK-Host1:2181. Will not attempt to authenticate using SASL (unknown 
> error)
> 2015-09-26 15:25:17,470 INFO [main-SendThread(ZK-Host1:2181)] 
> zookeeper.ClientCnxn: Client session timed out, have not heard from server in 
> 30023ms for sessionid 0x2045762cc710006, closing socket connection and 
> attempting reconnect
> 2015-09-26 15:25:17,571 WARN [master/HM1-Host/HM1-IP:16000] 
> zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, 
> quorum=ZK-Host:2181,ZK-Host1:2181,ZK-Host2:2181, 
> exception=org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for /hbase/master
> 2015-09-26 15:25:17,872 INFO [main-SendThread(ZK-Host:2181)] 
> client.FourLetterWordMain: connecting to ZK-Host 2181
> 2015-09-26 15:25:19,874 WARN [main-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Can not get the principle name from server ZK-Host
> 2015-09-26 15:25:19,874 INFO [main-SendThread(ZK-Host:2181)] 
> zookeeper.ClientCnxn: Opening socket connection to server ZK-Host/ZK-IP:2181. 
> Will not attempt to authenticate using SASL (unknown error)
> {code}
> > Since HM1 was not able to connect to any ZK, so session timeout didnt 
> > happen at Zookeeper server side and HM1 didnt abort.
> > On Zookeeper session timeout standby master (HM2) registered himself as an 
> > active master. 
> > HM2 is keep on waiting for region server to report him as part of active 
> > master intialization.
> {noformat} 
> 2015-09-26 15:24:44,928 | INFO | HM2-Host:21300.activeMasterManager | Waiting 
> for region servers count to settle; currently checked in 0, slept for 0 ms, 
> expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, interval 
> of 1500 ms. | 
> org.apache.hadoop.hbase.master.ServerManager.waitForRegionServers(ServerManager.java:1011)
> ---
> ---
> 2015-09-26 15:32:50,841 | INFO | HM2-Host:21300.activeMasterManager | Waiting 
> for region servers count to settle; currently checked in 0, slept for 483913 
> ms, expecting minimum of 1, maximum of 2147483647, timeout of 4500 ms, 
> interval of 1500 ms. | 
> org.apache.hadoop.hbase.master.ServerManager.waitForRegionServers(ServerManager.java:1011)
> {noformat}
> > At other end, region servers are reporting to HM1 on 3 sec interval. Here 
> > region server retrieve master location from zookeeper only when they 
> > couldn't connect to Master (ServiceException).
> Region Server will not report HM2 as per current design until unless HM1 
> abort,so HM2 will exit(InitializationMonitor) and again wait for region 
> servers in loop.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to