[
https://issues.apache.org/jira/browse/BIGTOP-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884993#comment-13884993
]
Roman Shaposhnik commented on BIGTOP-1192:
------------------------------------------
[~mantonov] Couple of high level comments:
# ssh-ing & the line -- this is where connection with BIGTOP-635 comes into
play. What I had in mind is that things like NetworkShutdownFailure would
actually utilize a generic cluster manipulation framework instead of explicitly
calling ssh or whatever. Now, I'm not saying that as a first cut ssh is bad --
rather than instead of calling it directly there probably should be an
interface (part of BIGTOP-635) methods of which get called to perform these
actions. Otherwise you'll end up with thins like new
ServiceKilledFailure(["localhost"], "crond") in real tests -- IOW things that
need to know host names, etc, instead of simply referring to the topology of
the cluster in an abstract manner and calling methods on various nodes that are
part of that topology
# "should we have bash script for all that setup?" -- absolutely! In fact,
better yet -- we probably should create a puppet module. That's how we set up
our clusters anyway.
> Add utilities to facilitate cluster failure testing into bigtop-test-framework
> ------------------------------------------------------------------------------
>
> Key: BIGTOP-1192
> URL: https://issues.apache.org/jira/browse/BIGTOP-1192
> Project: Bigtop
> Issue Type: New Feature
> Components: Tests
> Reporter: Mikhail Antonov
> Assignee: Mikhail Antonov
> Labels: bigtop, itest, smokes
> Fix For: 0.8.0
>
> Attachments: BIGTOP-1192.1.patch, BIGTOP-1192.2.patch
>
>
> The goal is to provide Bigtop module maintainers with a set of set of util
> classes to help develop smoke tests able to simulate certain failures during
> smoke tests execution on a cluster.
> Summary of what is provided in current patch.
> Following failure types are supported now:
> - Service stopped and restarted (on given set of nodes)
> - Service killed with 'kill -9' and started back up (on given set of nodes)
> - Node inbound/outbound connections are shut down and brought back up (via
> iptables).
>
> System requirements to run smoke tests with failures.
> * password-less (PKI-based) root ssh to all nodes in cluster being tested
> is assumed.
> * for local tests, like ClusterFailuresTest, one should have password-less
> root ssh to localhost.
> * env variable BIGTOP_SMOKES_CLUSTER_IDENTITY_FILE should point to
> according private key file.
> Further thoughts (not included in this patch)
> Cluster provisioning
> - Bigtop test framework (failures part of it) doesn't need to know about
> cluster topology, as it simply executes set of SSH commands on remote hosts
> (whose addresses are provided by specific
> module smoke test developer). But the actual tests do need to know about
> cluster topology to run sophisticated failure scenarios.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)