[ 
https://issues.apache.org/jira/browse/BIGTOP-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13885063#comment-13885063
 ] 

Mikhail Antonov edited comment on BIGTOP-1192 at 1/29/14 6:34 AM:
------------------------------------------------------------------

Roman,

 1) Agree that more convenient interface would be better. For now, though, in 
scope of this jira and with approach "get specific problem addressed", it's 
probably ok to have 2 layers of abstraction - lower being Shell class to 
execute arbitrary commands, higher is like "restart service S1 on hosts H1, H2, 
H2)". If there's a demand to have more types of failures, then will definitely 
worth further improvement.

 2) Regarding "where to keep and specify host names etc". The code in 
bigtop-framework-tests (itest) at this level of abstraction as in this patch, 
doesn't really need to know cluster topology, right - it operates at level - 
execute certain logical command, like "restart service" S1 on hosts H1, H2, H3. 
Where the hosts names are coming from itest doesn't care now. But the actual 
module smoke tests definitely should. 

So I would say - these 2 are different issues we need to address.

One is having API to execute well defined set of logical command against the 
specified list of nodes. That is I guess what this jira is about (and probably 
there are other types of failures to be added later on this list, as need 
arises - for example, I don't know, "run some program on node H1 which eats up 
almost all memory/CPU/network bandwidth to softly shake the services).

Second is to have a way to describe the cluster logical topology - network, 
nodes, roles, services etc, and be able to access it from the tests. That is 
what is needed to be able to run real complex smoke tests from jenkins builds 
in flexible way. I guess that's next step (which I'd also be glad to contribute 
to).


was (Author: mantonov):
Roman,

 1) Agree that more convenient interface would be better. For now, though, in 
scope of this jira and with approach "get specific problem addressed", it's 
probably ok to have 2 layers of abstraction - lower being Shell class to 
execute arbitrary commands, higher is like "restart service S1 on hosts H1, H2, 
H2)". If there's a demand to have more types of failures, then will definitely 
worth further improvement.

 2) Regarding "where to keep and specify host names etc". The code in 
bigtop-framework-tests (itest) at this level of abstraction as in this patch, 
doesn't really need to know cluster topology, right - it operates at level - 
execute certain logical command, like "restart service" S1 on hosts H1, H2, H3. 
Where the hosts names are coming from itest doesn't care now. But the actual 
module smoke tests definitely should. 

So I would say - these 2 are different issues we need to address.

One is having API to execute well defined set of logical command against the 
specified list of nodes. That is I guess what this jira is about (and probably 
there are other types of failures to be added later on this list, as need 
arises - for example, I don't know, "run some program on node H1 which eats up 
almost all memory/CPU/network bandwidth to softly shake the services).

Second is to have a way to describe the cluster logical topology - network, 
nodes, roles, services etc, and be able to access it from the tests. That is 
what need to be able to run real smoke tests from jenkins builds in flexible 
way. I guess that's next step (which I'd also be glad to contribute to).

> Add utilities to facilitate cluster failure testing into bigtop-test-framework
> ------------------------------------------------------------------------------
>
>                 Key: BIGTOP-1192
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-1192
>             Project: Bigtop
>          Issue Type: New Feature
>          Components: Tests
>    Affects Versions: 0.7.0
>            Reporter: Mikhail Antonov
>            Assignee: Mikhail Antonov
>              Labels: itest, smokes
>             Fix For: 0.8.0
>
>         Attachments: BIGTOP-1192.1.patch, BIGTOP-1192.2.patch
>
>
> The goal is to provide Bigtop module maintainers with a set of set of util 
> classes to help develop smoke tests able to simulate certain failures during 
> smoke tests execution on a cluster.
> Summary of what is provided in current patch. 
> Following failure types are supported now:
>  - Service stopped and restarted (on given set of nodes)
>  - Service killed with 'kill -9' and started back up (on given set of nodes)
>  - Node inbound/outbound connections are shut down and brought back up (via 
> iptables).
>  
> System requirements to run smoke tests with failures.
>  *  password-less (PKI-based) root ssh to all nodes in cluster being tested 
> is assumed.
>  *  for local tests, like ClusterFailuresTest, one should have password-less 
> root ssh to localhost.
>  *  env variable BIGTOP_SMOKES_CLUSTER_IDENTITY_FILE should point to 
> according private key file.
> Further thoughts (not included in this patch)
>   Cluster provisioning
>    - Bigtop test framework (failures part of it) doesn't need to know about 
> cluster topology, as it simply executes set of SSH commands on remote hosts 
> (whose addresses are provided by specific
>    module smoke test developer). But the actual tests do need to know about 
> cluster topology to run sophisticated failure scenarios.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to