[GitHub] [hadoop-ozone] elek commented on issue #11: HDDS-2291. Acceptance tests for OM HA.

GitBox Mon, 21 Oct 2019 01:38:31 -0700

elek commented on issue #11: HDDS-2291. Acceptance tests for OM HA.
URL: https://github.com/apache/hadoop-ozone/pull/11#issuecomment-544410792

Thank you very much the patch @hanishakoneru . Overall I am very to happy
have more HA tests with the robot framework and I would be happy to commit it
(after clean builds).

_Personally_ I would prefer to use a different approach, but it's only
because I may have different thinking. It may not be better or worse. The only
thing what I would like to do here is the explain my view, just because this is
the fan part: to understand the thinking of each other.

__1. The level of the tests__

To run acceptance test we need to solve two problems:

1.) Create a running ozone cluster (and may restart services during the
tests)
2.) Execute commands / check the results (run tests + assert)

Currently these two roles/levels are separated.

The second one is implemented by the [robot
tests](https://github.com/apache/hadoop-ozone/tree/master/hadoop-ozone/dist/src/main/smoketest)
but the (existing) robot tests don't include any logic to start (or restart)
services.

The environments are mainly defined with docker-compose files and the logic
to start them is defined by __shell scripts__ (for example
[this](https://github.com/apache/hadoop-ozone/blob/master/hadoop-ozone/dist/src/main/compose/ozone/test.sh)
is the simplest one)

The two levels/roles are separated.

__ 2. the flexibility __

The main advantage of this approach that you can run the tests in different
environments. For example I can replace the __shell__ script based cluster
creation process with anything else.

1. I can create kubernetes clusters and execute the same robot tests inside.
2. Anybody can execute the same robot tests in any commercial Hadoop/Ozone
distribution

__ 3. blockade __

Blockade based tests are slightly different. They do both 1 (cluster
creation) and 2 (test + assertion). Mainly because they are more interested
about the environment setup (creating cluster, shutting down nodes, etc.).

They do all the cluster set up / tear down based on docker-compose and the
logic is defined in python scripts.

__ 4. docker + ssh __

This patch follows a different approach. Instead of using docker-compose to
start/stop/restart services/nodes it installs an additional ssh daemon inside
the containers to make it possible to restart the jvm process instead of the
containers. (docker-compose is used to start/stop services and ssh daemons are
used to restart)

Usually this is not the way which is suggested to use in containerized
environments. With docker usually it's easier to restart the containers and run
only one process per container (and it provides better separation and easier
management).

__ 5. this patch __

But the previous approach (using docker-compose to start / stop instead of
ssh) is not portable at all. It can't be started inside kubernetes with little
effort(for example).

On the other hand this *patch can be used very easily* in other environments
as the "service restart" part of the environment management is included (with
the help of ssh).

**Summary**:

* This is a slightly different approach what we followed in the normal
tests and not the mainstream usage of the containers
* But it's very effective and has some clear advantages (easier to re-use
tests in different env)
* I have ideas how can it be done in a different way but they have
different drawbacks (and different advantages)

With other worlds: if we separate the _environment creation_ from the _test
definitions_ where should we put the restart functionality to. You put it to
the place where we have the _test definition_, I described a system where it
can be put to the place where we have the _environment creation_.

I think both approach is acceptable, __I will commit this one after a green
acceptance test run__.

(And we can continue the thinking about how these tests can be evolved. For
example: Do we need to separate these kind of the tests and create more tests
where we restart clusters?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[GitHub] [hadoop-ozone] elek commented on issue #11: HDDS-2291. Acceptance tests for OM HA.

Reply via email to