[
https://issues.apache.org/jira/browse/BIGTOP-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13889189#comment-13889189
]
Mikhail Antonov edited comment on BIGTOP-1192 at 2/3/14 3:26 AM:
-----------------------------------------------------------------
The purpose of all that is to enable smoke tests which validate that after
failure of some process, running on the node the cluster service remains
available, e.g. failover/quorum etc are working - right? In that sense, in case
of hadoop services also being watch-dogged on Ubuntu, this particular "kill -9"
test doesn't help much (it will not fail, but will just do nothing meaningful),
and Restart-based test would be better used instead.
(Generally, we also should probably add a failure which reboots or shutdowns
the whole box)
was (Author: mantonov):
The purpose of all that is to enable smoke tests which validate that after
failure of some process, running on the node the cluster service remains
available, e.g. failover/quorum etc are working - right? In that sense, with
the behavior we've seen on Ubuntu, this particular "kill -9" test doesn't help
much, ans Restart-based test should be used instead.
(Generally, we also should probably add a failure which reboots or shutdowns
the whole box)
> Add utilities to facilitate cluster failure testing into bigtop-test-framework
> ------------------------------------------------------------------------------
>
> Key: BIGTOP-1192
> URL: https://issues.apache.org/jira/browse/BIGTOP-1192
> Project: Bigtop
> Issue Type: New Feature
> Components: Tests
> Affects Versions: 0.7.0
> Reporter: Mikhail Antonov
> Assignee: Mikhail Antonov
> Labels: itest, smokes
> Fix For: 0.8.0
>
> Attachments: BIGTOP-1192.1.patch, BIGTOP-1192.2.patch,
> BIGTOP-1192.3.patch, BIGTOP-1192.4.patch, BIGTOP-1192.patch, BIGTOP-1192.patch
>
>
> The goal is to provide Bigtop module maintainers with a set of set of util
> classes to help develop smoke tests able to simulate certain failures during
> smoke tests execution on a cluster.
> Summary of what is provided in current patch.
> Following failure types are supported now:
> - Service stopped and restarted (on given set of nodes)
> - Service killed with 'kill -9' and started back up (on given set of nodes)
> - Node inbound/outbound connections are shut down and brought back up (via
> iptables).
>
> System requirements to run smoke tests with failures.
> * password-less (PKI-based) root ssh to all nodes in cluster being tested
> is assumed.
> * for local tests, like ClusterFailuresTest, one should have password-less
> root ssh to localhost.
> * env variable BIGTOP_SMOKES_CLUSTER_IDENTITY_FILE should point to
> according private key file.
> Further thoughts (not included in this patch)
> Cluster provisioning
> - Bigtop test framework (failures part of it) doesn't need to know about
> cluster topology, as it simply executes set of SSH commands on remote hosts
> (whose addresses are provided by specific
> module smoke test developer). But the actual tests do need to know about
> cluster topology to run sophisticated failure scenarios.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)