Dima Spivak created HBASE-12852:
-----------------------------------
Summary: Tests from hbase-it that use ChaosMonkey don't fail if
SSH commands fail
Key: HBASE-12852
URL: https://issues.apache.org/jira/browse/HBASE-12852
Project: HBase
Issue Type: Bug
Components: integration tests
Affects Versions: 0.98.6
Reporter: Dima Spivak
Assignee: Dima Spivak
I've just started rolling my sleeves up and playing about with hbase-it (at the
moment, only on 0.98.6), but wanted to begin filing JIRAs for issues I
encounter so that I don't forget to get to them. First up is the fact that it
seems that tests run with ChaosMonkey don't fail when the ChaosMonkey fails to
work. As an example, while running IntegrationTestIngest with a
slowDeterministic CM, I forgot to set up SSH properly and saw the following:
{code}
15/01/14 07:36:53 WARN hbase.ClusterManager: Remote command: ps aux | grep
proc_regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill -s
SIGKILL , hostname:node-5.internal failed at attempt 4. Retrying until
maxAttempts: 5. Exception: stderr: Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,password).
, stdout:
15/01/14 07:36:53 INFO util.RetryCounter: Sleeping 16000ms before retry #4...
15/01/14 07:36:53 INFO zookeeper.ZooKeeper: Session: 0x14ae74d7bac006b closed
15/01/14 07:36:53 INFO policies.Policy: Sleeping for: 59541
15/01/14 07:36:53 INFO zookeeper.ClientCnxn: EventThread shut down
Failed to write keys: 0
Key range: [150000..159999]
Batch updates: false
Percent of keys to update: 60
Updater threads: 10
Ignore nonce conflicts: true
Regions per server: 5
15/01/14 07:36:56 INFO util.LoadTestTool: Starting to mutate data...
Starting to mutate data...
15/01/14 07:36:57 INFO policies.Policy: Sleeping for: 88816
15/01/14 07:37:01 INFO util.MultiThreadedAction: [U:10] Keys=471, cols=5.7 K,
time=00:00:05 Overall: [keys/s= 94, latency=102 ms] Current: [keys/s=94,
latency=102 ms], wroteUpTo=149999
15/01/14 07:37:06 INFO util.MultiThreadedAction: [U:10] Keys=908, cols=11.0 K,
time=00:00:10 Overall: [keys/s= 90, latency=90 ms] Current: [keys/s=87,
latency=77 ms], wroteUpTo=149999
15/01/14 07:37:09 INFO hbase.ClusterManager: Executing remote command: ps aux |
grep proc_regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill
-s SIGKILL , hostname:node-5.internal
15/01/14 07:37:09 INFO util.Shell: Executing full command [/usr/bin/ssh
node-5.internal "ps aux | grep proc_regionserver | grep -v grep | tr -s ' ' |
cut -d ' ' -f2 | xargs kill -s SIGKILL"]
15/01/14 07:37:09 WARN policies.Policy: Exception occured during performing
action: ExitCodeException exitCode=255: stderr: Permission denied, please try
again.
Permission denied, please try again.
Permission denied (publickey,password).
, stdout:
at
org.apache.hadoop.hbase.HBaseClusterManager.exec(HBaseClusterManager.java:208)
at
org.apache.hadoop.hbase.HBaseClusterManager.execWithRetries(HBaseClusterManager.java:223)
at
org.apache.hadoop.hbase.HBaseClusterManager.signal(HBaseClusterManager.java:268)
at org.apache.hadoop.hbase.ClusterManager.kill(ClusterManager.java:97)
at
org.apache.hadoop.hbase.DistributedHBaseCluster.killRegionServer(DistributedHBaseCluster.java:110)
at org.apache.hadoop.hbase.chaos.actions.Action.killRs(Action.java:84)
at
org.apache.hadoop.hbase.chaos.actions.RestartActionBaseAction.restartRs(RestartActionBaseAction.java:50)
at
org.apache.hadoop.hbase.chaos.actions.RestartRsHoldingMetaAction.perform(RestartRsHoldingMetaAction.java:38)
at
org.apache.hadoop.hbase.chaos.policies.DoActionsOncePolicy.runOneIteration(DoActionsOncePolicy.java:50)
at
org.apache.hadoop.hbase.chaos.policies.PeriodicPolicy.run(PeriodicPolicy.java:41)
at
org.apache.hadoop.hbase.chaos.policies.CompositeSequentialPolicy.run(CompositeSequentialPolicy.java:42)
at java.lang.Thread.run(Thread.java:745)
{code}
Seems to me that tests should fail in these instances rather than just toss a
warning. Was this just an oversight, [~enis] and [~ndimiduk], or is this by
design?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)