Dima Spivak created HBASE-12852:
-----------------------------------

             Summary: Tests from hbase-it that use ChaosMonkey don't fail if 
SSH commands fail
                 Key: HBASE-12852
                 URL: https://issues.apache.org/jira/browse/HBASE-12852
             Project: HBase
          Issue Type: Bug
          Components: integration tests
    Affects Versions: 0.98.6
            Reporter: Dima Spivak
            Assignee: Dima Spivak


I've just started rolling my sleeves up and playing about with hbase-it (at the 
moment, only on 0.98.6), but wanted to begin filing JIRAs for issues I 
encounter so that I don't forget to get to them. First up is the fact that it 
seems that tests run with ChaosMonkey don't fail when the ChaosMonkey fails to 
work. As an example, while running IntegrationTestIngest with a 
slowDeterministic CM, I forgot to set up SSH properly and saw the following:
{code}
15/01/14 07:36:53 WARN hbase.ClusterManager: Remote command: ps aux | grep 
proc_regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill -s 
SIGKILL , hostname:node-5.internal failed at attempt 4. Retrying until 
maxAttempts: 5. Exception: stderr: Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,password).
, stdout: 
15/01/14 07:36:53 INFO util.RetryCounter: Sleeping 16000ms before retry #4...
15/01/14 07:36:53 INFO zookeeper.ZooKeeper: Session: 0x14ae74d7bac006b closed
15/01/14 07:36:53 INFO policies.Policy: Sleeping for: 59541
15/01/14 07:36:53 INFO zookeeper.ClientCnxn: EventThread shut down
Failed to write keys: 0
Key range: [150000..159999]
Batch updates: false
Percent of keys to update: 60
Updater threads: 10
Ignore nonce conflicts: true
Regions per server: 5
15/01/14 07:36:56 INFO util.LoadTestTool: Starting to mutate data...
Starting to mutate data...
15/01/14 07:36:57 INFO policies.Policy: Sleeping for: 88816
15/01/14 07:37:01 INFO util.MultiThreadedAction: [U:10] Keys=471, cols=5.7 K, 
time=00:00:05 Overall: [keys/s= 94, latency=102 ms] Current: [keys/s=94, 
latency=102 ms], wroteUpTo=149999
15/01/14 07:37:06 INFO util.MultiThreadedAction: [U:10] Keys=908, cols=11.0 K, 
time=00:00:10 Overall: [keys/s= 90, latency=90 ms] Current: [keys/s=87, 
latency=77 ms], wroteUpTo=149999
15/01/14 07:37:09 INFO hbase.ClusterManager: Executing remote command: ps aux | 
grep proc_regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill 
-s SIGKILL , hostname:node-5.internal
15/01/14 07:37:09 INFO util.Shell: Executing full command [/usr/bin/ssh  
node-5.internal "ps aux | grep proc_regionserver | grep -v grep | tr -s ' ' | 
cut -d ' ' -f2 | xargs kill -s SIGKILL"]
15/01/14 07:37:09 WARN policies.Policy: Exception occured during performing 
action: ExitCodeException exitCode=255: stderr: Permission denied, please try 
again.
Permission denied, please try again.
Permission denied (publickey,password).
, stdout: 
        at 
org.apache.hadoop.hbase.HBaseClusterManager.exec(HBaseClusterManager.java:208)
        at 
org.apache.hadoop.hbase.HBaseClusterManager.execWithRetries(HBaseClusterManager.java:223)
        at 
org.apache.hadoop.hbase.HBaseClusterManager.signal(HBaseClusterManager.java:268)
        at org.apache.hadoop.hbase.ClusterManager.kill(ClusterManager.java:97)
        at 
org.apache.hadoop.hbase.DistributedHBaseCluster.killRegionServer(DistributedHBaseCluster.java:110)
        at org.apache.hadoop.hbase.chaos.actions.Action.killRs(Action.java:84)
        at 
org.apache.hadoop.hbase.chaos.actions.RestartActionBaseAction.restartRs(RestartActionBaseAction.java:50)
        at 
org.apache.hadoop.hbase.chaos.actions.RestartRsHoldingMetaAction.perform(RestartRsHoldingMetaAction.java:38)
        at 
org.apache.hadoop.hbase.chaos.policies.DoActionsOncePolicy.runOneIteration(DoActionsOncePolicy.java:50)
        at 
org.apache.hadoop.hbase.chaos.policies.PeriodicPolicy.run(PeriodicPolicy.java:41)
        at 
org.apache.hadoop.hbase.chaos.policies.CompositeSequentialPolicy.run(CompositeSequentialPolicy.java:42)
        at java.lang.Thread.run(Thread.java:745)
{code}

Seems to me that tests should fail in these instances rather than just toss a 
warning. Was this just an oversight, [~enis] and [~ndimiduk], or is this by 
design?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to