elek commented on issue #11: HDDS-2291. Acceptance tests for OM HA.
URL: https://github.com/apache/hadoop-ozone/pull/11#issuecomment-544410792
 
 
   Thank you very much the patch @hanishakoneru . Overall I am very to happy 
have more HA tests  with the robot framework and I would be happy to commit it 
(after clean builds).
   
   _Personally_ I would prefer to use a different approach, but it's only 
because I may have different thinking. It may not be better or worse. The only 
thing what I would like to do here is the explain my view, just because this is 
the fan part: to understand the thinking of each other.
   
   __1. The level of the tests__
   
   To run acceptance test we need to solve two problems:
   
    1.) Create a running ozone cluster (and may restart services during the 
tests)
    2.) Execute commands / check the results (run tests + assert)
   
   Currently these two roles/levels are separated. 
   
   The second one is implemented by the [robot 
tests](https://github.com/apache/hadoop-ozone/tree/master/hadoop-ozone/dist/src/main/smoketest)
 but the (existing) robot tests don't include any logic to start (or restart) 
services.
   
   The environments are mainly defined with docker-compose files and the logic 
to start them is defined by __shell scripts__ (for example 
[this](https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/dist/src/main/compose/ozone/test.sh)
 is the simplest one)
   
   The two levels/roles are separated.
   
   __ 2. the flexibility __
   
   The main advantage of this approach that you can run the tests in different 
environments. For example I can replace the __shell__ script based cluster 
creation process with anything else.
    
    1. I can create kubernetes clusters and execute the same robot tests inside.
    2. Anybody can execute the same robot tests in any commercial Hadoop/Ozone 
distribution
   
   __ 3. blockade __
   
   Blockade based tests are slightly different. They do both 1 (cluster 
creation) and 2 (test + assertion). Mainly because they are more interested 
about the environment setup (creating cluster, shutting down nodes, etc.).
   
   They do all the cluster set up / tear down based on docker-compose and the 
logic is defined in python scripts.
   
   __ 4. docker + ssh __
   
   This patch follows a different approach. Instead of using docker-compose to 
start/stop/restart services/nodes it installs an additional ssh daemon inside 
the containers to make it possible to restart the jvm process instead of the 
containers. (docker-compose is used to start/stop services and ssh daemons are 
used to restart)
   
   Usually this is not the way which is suggested to use in containerized 
environments. With docker usually it's easier to restart the containers and run 
only one process per container (and it provides better separation and easier 
management).
   
   __ 5. this patch __
   
   But the previous approach (using docker-compose to start / stop instead of 
ssh) is not portable at all. It can't be started inside kubernetes with little 
effort(for example). 
   
   On the other hand this *patch can be used very easily* in other environments 
as the "service restart" part of the environment management is included (with 
the help of ssh).
   
   **Summary**:
   
    * This is a slightly different approach what we followed in the normal 
tests and not the mainstream usage of the containers
    * But it's very effective and has some clear advantages (easier to re-use 
tests in different env)
    * I have ideas how can it be done in a different way but they have 
different drawbacks (and different advantages)
   
   With other worlds: if we separate the _environment creation_ from the _test 
definitions_ where should we put the restart functionality to. You put it to 
the place where we have the _test definition_, I described a system where it 
can be put to the place where we have the _environment creation_.
   
   I think both approach is acceptable, __I will commit this one after a green 
acceptance test run__. 
   
   (And we can continue the thinking about how these tests can be evolved. For 
example: Do we need to separate these kind of the tests and create more tests 
where we restart clusters?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to