[
https://issues.apache.org/jira/browse/HBASE-12852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819325#comment-15819325
]
Enis Soztutar commented on HBASE-12852:
---------------------------------------
The test should not fail if CM actions fail (like failed to stop the
regionserver for whatever reason). But we can do a basic smoke test (like
making sure SSH works, etc) for testing the setup of CM and fail the test if
that smoke test fails.
> Tests from hbase-it that use ChaosMonkey don't fail if SSH commands fail
> ------------------------------------------------------------------------
>
> Key: HBASE-12852
> URL: https://issues.apache.org/jira/browse/HBASE-12852
> Project: HBase
> Issue Type: Bug
> Components: integration tests
> Affects Versions: 0.98.6
> Reporter: Dima Spivak
> Assignee: Dima Spivak
>
> I've just started rolling my sleeves up and playing about with hbase-it (at
> the moment, only on 0.98.6), but wanted to begin filing JIRAs for issues I
> encounter so that I don't forget to get to them. First up is the fact that it
> seems that tests run with ChaosMonkey don't fail when the ChaosMonkey fails
> to work. As an example, while running IntegrationTestIngest with a
> slowDeterministic CM, I forgot to set up SSH properly and saw the following:
> {code}
> 15/01/14 07:36:53 WARN hbase.ClusterManager: Remote command: ps aux | grep
> proc_regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill -s
> SIGKILL , hostname:node-5.internal failed at attempt 4. Retrying until
> maxAttempts: 5. Exception: stderr: Permission denied, please try again.
> Permission denied, please try again.
> Permission denied (publickey,password).
> , stdout:
> 15/01/14 07:36:53 INFO util.RetryCounter: Sleeping 16000ms before retry #4...
> 15/01/14 07:36:53 INFO zookeeper.ZooKeeper: Session: 0x14ae74d7bac006b closed
> 15/01/14 07:36:53 INFO policies.Policy: Sleeping for: 59541
> 15/01/14 07:36:53 INFO zookeeper.ClientCnxn: EventThread shut down
> Failed to write keys: 0
> Key range: [150000..159999]
> Batch updates: false
> Percent of keys to update: 60
> Updater threads: 10
> Ignore nonce conflicts: true
> Regions per server: 5
> 15/01/14 07:36:56 INFO util.LoadTestTool: Starting to mutate data...
> Starting to mutate data...
> 15/01/14 07:36:57 INFO policies.Policy: Sleeping for: 88816
> 15/01/14 07:37:01 INFO util.MultiThreadedAction: [U:10] Keys=471, cols=5.7 K,
> time=00:00:05 Overall: [keys/s= 94, latency=102 ms] Current: [keys/s=94,
> latency=102 ms], wroteUpTo=149999
> 15/01/14 07:37:06 INFO util.MultiThreadedAction: [U:10] Keys=908, cols=11.0
> K, time=00:00:10 Overall: [keys/s= 90, latency=90 ms] Current: [keys/s=87,
> latency=77 ms], wroteUpTo=149999
> 15/01/14 07:37:09 INFO hbase.ClusterManager: Executing remote command: ps aux
> | grep proc_regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs
> kill -s SIGKILL , hostname:node-5.internal
> 15/01/14 07:37:09 INFO util.Shell: Executing full command [/usr/bin/ssh
> node-5.internal "ps aux | grep proc_regionserver | grep -v grep | tr -s ' ' |
> cut -d ' ' -f2 | xargs kill -s SIGKILL"]
> 15/01/14 07:37:09 WARN policies.Policy: Exception occured during performing
> action: ExitCodeException exitCode=255: stderr: Permission denied, please try
> again.
> Permission denied, please try again.
> Permission denied (publickey,password).
> , stdout:
> at
> org.apache.hadoop.hbase.HBaseClusterManager.exec(HBaseClusterManager.java:208)
> at
> org.apache.hadoop.hbase.HBaseClusterManager.execWithRetries(HBaseClusterManager.java:223)
> at
> org.apache.hadoop.hbase.HBaseClusterManager.signal(HBaseClusterManager.java:268)
> at org.apache.hadoop.hbase.ClusterManager.kill(ClusterManager.java:97)
> at
> org.apache.hadoop.hbase.DistributedHBaseCluster.killRegionServer(DistributedHBaseCluster.java:110)
> at org.apache.hadoop.hbase.chaos.actions.Action.killRs(Action.java:84)
> at
> org.apache.hadoop.hbase.chaos.actions.RestartActionBaseAction.restartRs(RestartActionBaseAction.java:50)
> at
> org.apache.hadoop.hbase.chaos.actions.RestartRsHoldingMetaAction.perform(RestartRsHoldingMetaAction.java:38)
> at
> org.apache.hadoop.hbase.chaos.policies.DoActionsOncePolicy.runOneIteration(DoActionsOncePolicy.java:50)
> at
> org.apache.hadoop.hbase.chaos.policies.PeriodicPolicy.run(PeriodicPolicy.java:41)
> at
> org.apache.hadoop.hbase.chaos.policies.CompositeSequentialPolicy.run(CompositeSequentialPolicy.java:42)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Seems to me that tests should fail in these instances rather than just toss a
> warning. Was this just an oversight, [~enis] and [~ndimiduk], or is this by
> design?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)