[ 
https://issues.apache.org/jira/browse/HBASE-19694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318852#comment-16318852
 ] 

stack commented on HBASE-19694:
-------------------------------

Just ran into this.

2018-01-09 10:04:45,463 INFO  
[master/ve0524.halxg.cloudera.com/10.17.240.20:16000] 
zookeeper.ReadOnlyZKClient: Start read only zookeeper connection 0x2555c2c0 to 
ve0524.halxg.cloudera.com:2222, session timeout 90000 ms, retries 30, retry 
interval 1000 ms, keep alive 60000 ms
2018-01-09 10:04:45,469 INFO  [ReadOnlyZKClient] zookeeper.ZooKeeper: 
Initiating client connection, connectString=ve0524.halxg.cloudera.com:2222 
sessionTimeout=90000 
watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$43/1505185046@1cb5c844
2018-01-09 10:04:45,470 INFO  
[ReadOnlyZKClient-SendThread(ve0524.halxg.cloudera.com:2222)] 
zookeeper.ClientCnxn: Opening socket connection to server 
ve0524.halxg.cloudera.com/10.17.240.20:2222. Will not attempt to authenticate 
using SASL (unknown error)
2018-01-09 10:04:45,470 INFO  
[ReadOnlyZKClient-SendThread(ve0524.halxg.cloudera.com:2222)] 
zookeeper.ClientCnxn: Socket connection established to 
ve0524.halxg.cloudera.com/10.17.240.20:2222, initiating session
2018-01-09 10:04:45,473 DEBUG [main-EventThread] zookeeper.ZKWatcher: 
master:16000-0x160dc1881df0000, quorum=ve0524.halxg.cloudera.com:2222, 
baseZNode=/stack2 Received ZooKeeper Event, type=NodeCreated, 
state=SyncConnected, path=/stack2/master
2018-01-09 10:04:45,480 INFO  
[ReadOnlyZKClient-SendThread(ve0524.halxg.cloudera.com:2222)] 
zookeeper.ClientCnxn: Session establishment complete on server 
ve0524.halxg.cloudera.com/10.17.240.20:2222, sessionid = 0x160dc1881df0001, 
negotiated timeout = 90000
2018-01-09 10:04:45,482 INFO  [ve0524:16000.masterManager] 
master.ActiveMasterManager: Deleting ZNode for 
/stack2/backup-masters/ve0524.halxg.cloudera.com,16000,1515521082913 from 
backup master directory
2018-01-09 10:04:45,483 DEBUG [main-EventThread] master.ActiveMasterManager: A 
master is now available
2018-01-09 10:04:45,490 WARN  
[master/ve0524.halxg.cloudera.com/10.17.240.20:16000] 
client.ConnectionImplementation: Retrieve cluster id failed
java.util.concurrent.ExecutionException: 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for /stack2/hbaseid
  at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
  at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
  at 
org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:518)
  at 
org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:286)
  at 
org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.<init>(ConnectionUtils.java:141)
  at 
org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.<init>(ConnectionUtils.java:132)
  at 
org.apache.hadoop.hbase.client.ConnectionUtils.createShortCircuitConnection(ConnectionUtils.java:185)
  at 
org.apache.hadoop.hbase.regionserver.HRegionServer.createClusterConnection(HRegionServer.java:775)
  at 
org.apache.hadoop.hbase.regionserver.HRegionServer.setupClusterConnection(HRegionServer.java:806)
  at 
org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:821)
  at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:932)
  at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:546)
  at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for /stack2/hbaseid
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
  at 
org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:163)
  at 
org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:311)
  ... 1 more
2018-01-09 10:04:45,492 DEBUG 
[master/ve0524.halxg.cloudera.com/10.17.240.20:16000] 
client.ConnectionImplementation: clusterid came back null, using default 
default-cluster
2018-01-09 10:04:45,497 DEBUG [main-EventThread] zookeeper.ZKWatcher: 
master:16000-0x160dc1881df0000, quorum=ve0524.halxg.cloudera.com:2222, 
baseZNode=/stack2 Received ZooKeeper Event, type=NodeDeleted, 
state=SyncConnected, 
path=/stack2/backup-masters/ve0524.halxg.cloudera.com,16000,1515521082913
2018-01-09 10:04:45,499 INFO  [ve0524:16000.masterManager] 
master.ActiveMasterManager: Registered Active 
Master=ve0524.halxg.cloudera.com,16000,1515521082913
2018-01-09 10:04:45,502 DEBUG 
[master/ve0524.halxg.cloudera.com/10.17.240.20:16000] 
util.ResourceLeakDetectorFactory: Loaded default ResourceLeakDetector: 
org.apache.hadoop.hbase.shaded.io.netty.util.ResourceLeakDetector@50c3df8a
2018-01-09 10:04:45,505 INFO  [ve0524:16000.masterManager] 
regionserver.ChunkCreator: Allocating MemStoreChunkPool with chunk size 2 MB, 
max count 3145, initial count 0
2018-01-09 10:04:45,511 DEBUG 
[master/ve0524.halxg.cloudera.com/10.17.240.20:16000] ipc.AbstractRpcClient: 
Codec=org.apache.hadoop.hbase.codec.KeyValueCodec@22ff2b54, compressor=null, 
tcpKeepAlive=true, tcpNoDelay=true, connectTO=10000, readTO=20000, 
writeTO=60000, minIdleTimeBeforeClose=120000, maxRetries=0, 
fallbackAllowed=false, bind address=null
2018-01-09 10:04:45,571 ERROR [ve0524:16000.masterManager] master.HMaster: 
Failed to become active master
java.net.ConnectException: Call From ve0524.halxg.cloudera.com/10.17.240.20 to 
ve0524.halxg.cloudera.com:8020 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

> The initialization order for a fresh cluster is incorrect
> ---------------------------------------------------------
>
>                 Key: HBASE-19694
>                 URL: https://issues.apache.org/jira/browse/HBASE-19694
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Duo Zhang
>            Assignee: stack
>            Priority: Critical
>             Fix For: 2.0.0-beta-2
>
>
> The cluster id will set once we become the active master in 
> finishActiveMasterInitialization, but the blockUntilBecomingActiveMaster and 
> finishActiveMasterInitialization are both called in a thread to make the 
> constructor of HMaster return without blocking. And since HMaster itself is 
> also a HRegionServer, it will create a Connection and then start calling 
> reportForDuty. And when creating the ConnectionImplementation, we will read 
> the cluster id from zk, but the cluster id may have not been set yet since it 
> is set in another thread, we will get an exception and use the default 
> cluster id instead.
> I always get this when running UTs which will start a mini cluster
> {noformat}
> 2018-01-03 15:16:37,916 WARN  [M:0;zhangduo-ubuntu:32848] 
> client.ConnectionImplementation(528): Retrieve cluster id failed
> java.util.concurrent.ExecutionException: 
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for /hbase/hbaseid
>       at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>       at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>       at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:526)
>       at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.<init>(ConnectionImplementation.java:286)
>       at 
> org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.<init>(ConnectionUtils.java:141)
>       at 
> org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.<init>(ConnectionUtils.java:137)
>       at 
> org.apache.hadoop.hbase.client.ConnectionUtils.createShortCircuitConnection(ConnectionUtils.java:185)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.createClusterConnection(HRegionServer.java:781)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.setupClusterConnection(HRegionServer.java:812)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:827)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:938)
>       at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:550)
>       at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
> KeeperErrorCode = NoNode for /hbase/hbaseid
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>       at 
> org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:163)
>       at 
> org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:311)
>       ... 1 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to