[
https://issues.apache.org/jira/browse/HBASE-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Elliott Clark updated HBASE-8924:
---------------------------------
Attachment: hbase-hbase-master-a1805.halxg.cloudera.com.log.gz
Here's the log that contains the failed restart.
Here's the log from the test trying to bring master back up.
{code}
2013-07-10 18:02:06,423 INFO [pool-1-thread-4] hbase.ClusterManager: Executed
remote command, exit code:0 , output:
2013-07-10 18:02:06,424 INFO [pool-1-thread-4] util.ChaosMonkey: Killed master
server:a1805.halxg.cloudera.com,60000,1373500144613
2013-07-10 18:02:06,424 INFO [pool-1-thread-4] util.ChaosMonkey: Sleeping for:0
2013-07-10 18:02:06,424 INFO [pool-1-thread-4] util.ChaosMonkey: Starting
master:a1805.halxg.cloudera.com
2013-07-10 18:02:06,424 INFO [pool-1-thread-4] hbase.HBaseCluster: Starting
Master on: a1805.halxg.cloudera.com
2013-07-10 18:02:06,424 INFO [pool-1-thread-4] hbase.ClusterManager: Executing
remote command: /opt/hbase/current/bin/../bin/hbase-daemon.sh start master ,
hostname:a1805.halxg.cloudera.com
2013-07-10 18:02:06,425 INFO [pool-1-thread-4] util.Shell: Executing full
command [/usr/bin/ssh -o ConnectTimeout=1 -o StrictHostKeyChecking=no
a1805.halxg.cloudera.com "/opt/hbase/current/bin/../bin/hbase-daemon.sh start
master"]
2013-07-10 18:02:06,426 WARN [pool-1-thread-7]
client.HConnectionManager$HConnectionImplementation: Checking master connection
com.google.protobuf.ServiceException:
org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in
the failed servers list: a1805.halxg.cloudera.com/10.20.200.105:60000
at
org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1589)
at
org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1630)
at
org.apache.hadoop.hbase.protobuf.generated.MasterMonitorProtos$MasterMonitorService$BlockingStub.isMasterRunning(MasterMonitorProtos.java:3021)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterMonitorServiceState.isMasterRunning(HConnectionManager.java:1273)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:1916)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterMonitorService(HConnectionManager.java:1866)
at
org.apache.hadoop.hbase.client.HBaseAdmin.execute(HBaseAdmin.java:2682)
at
org.apache.hadoop.hbase.client.HBaseAdmin.getClusterStatus(HBaseAdmin.java:1945)
at
org.apache.hadoop.hbase.mttr.IntegrationTestMTTR$AdminCallable.doAction(IntegrationTestMTTR.java:470)
at
org.apache.hadoop.hbase.mttr.IntegrationTestMTTR$TimingCallable.call(IntegrationTestMTTR.java:370)
at
org.apache.hadoop.hbase.mttr.IntegrationTestMTTR$TimingCallable.call(IntegrationTestMTTR.java:353)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This
server is in the failed servers list:
a1805.halxg.cloudera.com/10.20.200.105:60000
at
org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:828)
at
org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1455)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1347)
at
org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1573)
... 15 more
{code}
> Master Can fail to come up after chaos monkey if the sleep time is too short.
> -----------------------------------------------------------------------------
>
> Key: HBASE-8924
> URL: https://issues.apache.org/jira/browse/HBASE-8924
> Project: HBase
> Issue Type: Bug
> Components: test
> Reporter: Elliott Clark
> Assignee: Elliott Clark
> Attachments: hbase-hbase-master-a1805.halxg.cloudera.com.log.gz
>
>
> On a real cluster the master won't come up if the sleep time between killing
> and starting is too short.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira