[ 
https://issues.apache.org/jira/browse/BIGTOP-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13889189#comment-13889189
 ] 

Mikhail Antonov edited comment on BIGTOP-1192 at 2/3/14 3:26 AM:
-----------------------------------------------------------------

The purpose of all that is to enable smoke tests which validate that after 
failure of some process, running on the node the cluster service remains 
available, e.g. failover/quorum etc are working - right? In that sense, in case 
of hadoop services also being watch-dogged on Ubuntu, this particular "kill -9" 
test doesn't help much (it will not fail, but will just do nothing meaningful), 
and Restart-based test would be better used instead.

(Generally, we also should probably add a failure which reboots or shutdowns 
the whole box)


was (Author: mantonov):
The purpose of all that is to enable smoke tests which validate that after 
failure of some process, running on the node the cluster service remains 
available, e.g. failover/quorum etc are working - right? In that sense, with 
the behavior we've seen on Ubuntu, this particular "kill -9" test doesn't help 
much, ans Restart-based test should be used instead.

(Generally, we also should probably add a failure which reboots or shutdowns 
the whole box)

> Add utilities to facilitate cluster failure testing into bigtop-test-framework
> ------------------------------------------------------------------------------
>
>                 Key: BIGTOP-1192
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-1192
>             Project: Bigtop
>          Issue Type: New Feature
>          Components: Tests
>    Affects Versions: 0.7.0
>            Reporter: Mikhail Antonov
>            Assignee: Mikhail Antonov
>              Labels: itest, smokes
>             Fix For: 0.8.0
>
>         Attachments: BIGTOP-1192.1.patch, BIGTOP-1192.2.patch, 
> BIGTOP-1192.3.patch, BIGTOP-1192.4.patch, BIGTOP-1192.patch, BIGTOP-1192.patch
>
>
> The goal is to provide Bigtop module maintainers with a set of set of util 
> classes to help develop smoke tests able to simulate certain failures during 
> smoke tests execution on a cluster.
> Summary of what is provided in current patch. 
> Following failure types are supported now:
>  - Service stopped and restarted (on given set of nodes)
>  - Service killed with 'kill -9' and started back up (on given set of nodes)
>  - Node inbound/outbound connections are shut down and brought back up (via 
> iptables).
>  
> System requirements to run smoke tests with failures.
>  *  password-less (PKI-based) root ssh to all nodes in cluster being tested 
> is assumed.
>  *  for local tests, like ClusterFailuresTest, one should have password-less 
> root ssh to localhost.
>  *  env variable BIGTOP_SMOKES_CLUSTER_IDENTITY_FILE should point to 
> according private key file.
> Further thoughts (not included in this patch)
>   Cluster provisioning
>    - Bigtop test framework (failures part of it) doesn't need to know about 
> cluster topology, as it simply executes set of SSH commands on remote hosts 
> (whose addresses are provided by specific
>    module smoke test developer). But the actual tests do need to know about 
> cluster topology to run sophisticated failure scenarios.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to