[ 
https://issues.apache.org/jira/browse/BIGTOP-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884993#comment-13884993
 ] 

Roman Shaposhnik commented on BIGTOP-1192:
------------------------------------------

[~mantonov] Couple of high level comments:
   # ssh-ing & the line -- this is where connection with BIGTOP-635 comes into 
play. What I had in mind is that things like NetworkShutdownFailure would 
actually utilize a generic cluster manipulation framework instead of explicitly 
calling ssh or whatever. Now, I'm not saying that as a first cut ssh is bad -- 
rather than instead of calling it directly there probably should be an 
interface (part of BIGTOP-635) methods of which get called to perform these 
actions. Otherwise you'll end up with thins like new 
ServiceKilledFailure(["localhost"], "crond") in real tests -- IOW things that 
need to know host names, etc, instead of simply referring to the topology of 
the cluster in an abstract manner and calling methods on various nodes that are 
part of that topology
  # "should we have bash script for all that setup?" -- absolutely! In fact, 
better yet -- we probably should create a puppet module. That's how we set up 
our clusters anyway.

> Add utilities to facilitate cluster failure testing into bigtop-test-framework
> ------------------------------------------------------------------------------
>
>                 Key: BIGTOP-1192
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-1192
>             Project: Bigtop
>          Issue Type: New Feature
>          Components: Tests
>            Reporter: Mikhail Antonov
>            Assignee: Mikhail Antonov
>              Labels: bigtop, itest, smokes
>             Fix For: 0.8.0
>
>         Attachments: BIGTOP-1192.1.patch, BIGTOP-1192.2.patch
>
>
> The goal is to provide Bigtop module maintainers with a set of set of util 
> classes to help develop smoke tests able to simulate certain failures during 
> smoke tests execution on a cluster.
> Summary of what is provided in current patch. 
> Following failure types are supported now:
>  - Service stopped and restarted (on given set of nodes)
>  - Service killed with 'kill -9' and started back up (on given set of nodes)
>  - Node inbound/outbound connections are shut down and brought back up (via 
> iptables).
>  
> System requirements to run smoke tests with failures.
>  *  password-less (PKI-based) root ssh to all nodes in cluster being tested 
> is assumed.
>  *  for local tests, like ClusterFailuresTest, one should have password-less 
> root ssh to localhost.
>  *  env variable BIGTOP_SMOKES_CLUSTER_IDENTITY_FILE should point to 
> according private key file.
> Further thoughts (not included in this patch)
>   Cluster provisioning
>    - Bigtop test framework (failures part of it) doesn't need to know about 
> cluster topology, as it simply executes set of SSH commands on remote hosts 
> (whose addresses are provided by specific
>    module smoke test developer). But the actual tests do need to know about 
> cluster topology to run sophisticated failure scenarios.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to