[jira] [Commented] (HDDS-1458) Create a maven profile to run fault injection tests

Elek, Marton (JIRA) Mon, 06 May 2019 02:17:15 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833629#comment-16833629
 ]


Elek, Marton commented on HDDS-1458:
------------------------------------

Thanks the answers [~eyang].

I think our views are closer. At lease we can agree in some technical 
properties. The big difference (IMHO) that we have different view about the 
importance of some issue/some usecases. (The proposed solution introduces heavy 
limitation on the current usage patterns).

The other differences that I can see multiple, different type of usage of 
containers. Fix me If I am wrong, but as far as I understood you would like to 
use containers with the same way everywhere. I will try to explain this at the 
end of this comment.


bq. The docker process adds 32 seconds. It is 26% increase in build time, 

Yes, we agree that it's slower.

Without SSD it's even slower.

Yes, I think it's a problem this is the reason why I put it under the "Cons" 
list

No, we can't skip docker image creation as the pseudo-cluster creation should 
be available for all of the time (as it's available as of now with the current 
solution). I think BOTH the unit tests AND the integration tests should be 
checked ALL the time.

SUMMARY:

 * What we agree: build is significat slower
 * What we don't agree: I think we need to keep it simple and fast to execute 
the smoketest, ALL the time. This is the supported by the proposed solution.

bq.  2. It's harder to test locally the patches. The reproducibility are 
decreased. (Instead of the final build a local container is used).

bq. Not true. Docker can use a local container or use official image by 
supplying -DskipDocker flag, and let docker compose yaml file decide to use 
local image or official released binary for fault-injection-test maven project. 
If you want to apply patch to official released docker image, then you are are 
already replacing binaries in docker image, it is no longer the official 
released docker image. Therefore, what is wrong with using a local image that 
give you exactly same version of everything that is described in Dockerfile of 
the official image?

Not exactly, I am talking about a differnt thing. Let's imagine that you start 
two build paralell. How do you know which image is used to execute the tests? 
You can't be sure. There is no direct relation ship between checked out source 
code, docker image which is created and the compose files. (We need build 
specific tag names for that which are saved to somewhere).

With mounting the volumes (as we do it now) we can be sure and you can execute 
multiple smoketests paralell.

 SUMMARY:

 * I think we are talking about different things
 * I think the paralell executing is borken by the proposed solution


bq. 3. It's harder to test a release package from the smoketest directory.

bq. Smoketest can converted to another submodule of ozone. The same 
-DskipDocker can run smoketest with official build without using local image. I 
will spend some time on this part of the project to make sure that I didn't 
break anything.

Please don't do it. As I wrote I would like to keep the possiblity to execute 
the smoketests WITHOUT build. I think this is a very useful feature to 

 1. Test convenience binary package during the vote
 2. Smoketest different install (eg. kubernetes install)

Please check my last vote. I executed the smoketest for both the src package 
and the bin package to be sure that both are good.

SUMMARY:

 * You think it's enough to execute smoketest during the build
 * I think it's very important to make it possible to run smoketests without a 
build from any install (convenience binary, kubernetes, on-prem install, etc.)

bq. Security fixes can be applied later to the docker containers.

bq. The correct design is to swap out the docker container completely with a 
new version. Patch and upgrade strategy should not be in-place modification of 
the docker container that grows over time. Overlay file system will need to be 
committed to retain state. By running container with in place binary 
replacement can lead to inconsistent state of docker container when power 
failure happens.

Again, I am not sure that we are talking about the same problem. I would like 
to "swap out" the old docker images completly. And because the first layers are 
updated it won't "grow" over time.

But let's make the conversation more clean: let's talk about tags. If I 
understood well, in case of a security issue you would like to drop (?) all the 
old images (eg. hadoop:2.9.0, hadoop:3.1.0, hadoop:3.2.0, hadoop:3.2.1) and 
create a new one (hadoop:3.2.2). 

First of all, I don't know how would you like to drop the images (as of now you 
need to create an INFRA ticket for that one). But there could be a lot of users 
of the old images. What would you do with them? With dropping old images you 
would break a lot of users (it's exactly the same as _deleting_ all the old 
hadoop releases in case of a security issue. But we don't do it).

SUMMARY:

 * AFAIK you would like to support only the latest images
 * I have a lot of use cases when I need old images and I would like to give 
_limited_ support (eg. update the underlaying op in the container eventually). 


bq. 5. It conflicts if more than one builds are executed on the same machine 
(docker images are shared but volume mounts are separated). (Even just this one 
is a blocker problem for me)

bq. Does this mean you have multiple source tree running build? Why not use 
multiple start-build-env.sh on parallel source trees? I think it can provide 
the same multi-build process, but need to test it out.

Yes, it means multiple builds.

Yes, I would like to support it even without start-build-env.sh

Yes, I would like to support it on Jenkins (where start-build-env is not used)

No, start-build-env doesn't solve the problem as you added the following lines:

{code}
+  -v "/var/run/docker.sock:/var/run/docker.sock" \
{code}

Which is added by Jenkins anyway so you can't fix it just with removing it from 
start-build-env.sh.

SUMMARY:

 * I have important use cases which are not supported by the proposed change.


bq. 6. Some tests can't be executed from the final tarball. (I would like to 
execute tests from the released binary, as discussed earlier in the original 
jira).

bq .Same answer as before use -DskipDocker flag, or use mvn install final 
tarball, and run docker build. Maybe I am missing something, please clarify.

I think I already (tried to) clarify it earlier. I would like to execute all 
the smoketests (and blockade tests) from the distibution package. It may not be 
important for you but please respect my which to support it. 
 
 * I would like to test the convenience release binaries during a vote (yes, I 
would like to be sure that both the src package (the de facto release) and the 
convenience binary package is fine.

 * I would like to execute the smoketests in different environment: for example 
on kubernetes nodes to be sure that it's instaled well (I do it even now, and 
it's very useful).  

SUMMARY:

 * I have important use cases which are not supported by the proposed change.

bq. I have a hard time with Ozone stable environment. Hadoop-runner is a read 
only file system without symlink support, and all binaries and data are mounted 
from external volume. What benefit does this provide? 

I did my best to enumerate the benefits (see the cons list). Reproducable test 
execution, parallel test execution, the ability to execute the tests with 
exactly the same bits.

bq. Everything is interacting with outside of the container environment. What 
containment does this style of docker image provides? I only see process 
namespace isolation, network interface simulation. But it lacks of ability to 
make docker container idempotent and reproducible else where.

You are 100% right. Here (for testing) we use only namespace and network 
isolation and reproducable runtime environment. I think what you are looking 
for is an other use case (what I called (1) option in my previous comment). And 
I totally agree that we need to support BOTH.

SUMMARY:

 * You would like to use full portable and self-contained container for all the 
use cases.
 * I can see multiple style of the container usage and while I would like to 
support the style what you prefer (see the k8s-dev profile and apache/ozone) 
image I also would like to use containers for development and testing with a 
different 'style' to get better, faster and more effective development 
experience.

bq. This can create problems that code only works in one node but can not 
reproduce else where. I often see distributed software being developed on one 
laptop, and having trouble to run in cluster of nodes. The root cause is the 
mounted volume is shared between containers, and developers often forgot about 
this and wrote code that does local IO access. When it goes to QA, nothing 
works in distributed nodes. It would be good to prevent ourselves from making 
this type of mistakes by not sharing the same mount points between containers 
as base line. I know that you may not have this problem with deeper 
understanding of distributed file system. However, it is common mistake among 
junior system developers. Application programmers may use shared volumes to 
exchange data between containers, but not the team that suppose to build 
distributed file system. I would like to prevent this type of silliness from 
happening by making sure that ozone processes don't exchange data via shared 
volumes.

Thanks this comment as it's a very technical concern, I understand it. If I 
understood well you say that two component may write eventually the same file. 
I think there is a very low chance to do this kind of problem. I have never 
seen it until now and our current code structure (we write most of the data 
with Metadata interface) it very hard (especially as we have very strong code 
review process).

But understand a risk and that's one reason that I would like to support to 
execute smoketest on kubernetes. And yes, for kubernetes we need to create the 
real containers, exactly what you would like to use (see, again, the k8s-dev 
profile).

In fact the local writable area of the components are part of the containers 
(/data or /tmp usually) and not mounted. 

SUMMARY:

 * You have some concerns about the identification problems when we try to 
write the same files from multiple components.
 * I think the chance to get some problems is low and we didn't see such 
problems until now. Adding ":ro" flag to the local mount can solve this problem.

Let's talk about the testing. For testing usually we have multiple 
layers(sometimes called it Test Pyramid 
https://martinfowler.com/bliki/TestPyramid.html). We may have unit tests, 
integration tests, acceptance tests, etc.

I have a very similar view about the container usage. There are multiple layers 
of the container usage. I think the first layer is to use containers for the 
network/disk layout isolation and the next level is to use totally independent 
and portable containers. And I totally agree with you in the benefit of the 
second level (portable containers) and I think it's very important to support 
that level. In this level we have apache:ozone docker image (totally portable, 
self container) and we have a tool to create similar containers for the dev 
images (k8s-dev and k8s-dev-push profile).

The big difference in our views that I can see very huge benefit to use 
contains on the level one where we use only the network/disk layout isolation. 
This is not a replacement of the usage of full containers but an other level of 
the usage for make it easy to develop and test (as you can see most of my 
comments are related to the development and testing process). 

I think it's a very important distinction. And the root cause of the conflict 
between our views that you try to find the level2 container usage where we use 
only contains on level1 level (because it's more effective).

bq. I haven't seen a project in Dockerhub that use the same technique as option 
2. Can you show me some examples of similar projects?

Dockerhub usually used to store level2 images, but there are patterns where the 
contains are used just as an environment. For example when a simple go docker 
image is used to create an application (and not with multi level docker build.)

> Create a maven profile to run fault injection tests
> ---------------------------------------------------
>
>                 Key: HDDS-1458
>                 URL: https://issues.apache.org/jira/browse/HDDS-1458
>             Project: Hadoop Distributed Data Store
>          Issue Type: Test
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>            Priority: Major
>         Attachments: HDDS-1458.001.patch, HDDS-1458.002.patch, 
> HDDS-1458.003.patch
>
>
> Some fault injection tests have been written using blockade.  It would be 
> nice to have ability to start docker compose and exercise the blockade test 
> cases against Ozone docker containers, and generate reports.  This is 
> optional integration tests to catch race conditions and fault tolerance 
> defects. 
> We can introduce a profile with id: it (short for integration tests).  This 
> will launch docker compose via maven-exec-plugin and run blockade to simulate 
> container failures and timeout.
> Usage command:
> {code}
> mvn clean verify -Pit
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDDS-1458) Create a maven profile to run fault injection tests

Reply via email to