[
https://issues.apache.org/jira/browse/HDDS-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833629#comment-16833629
]
Elek, Marton commented on HDDS-1458:
------------------------------------
Thanks the answers [~eyang].
I think our views are closer. At lease we can agree in some technical
properties. The big difference (IMHO) that we have different view about the
importance of some issue/some usecases. (The proposed solution introduces heavy
limitation on the current usage patterns).
The other differences that I can see multiple, different type of usage of
containers. Fix me If I am wrong, but as far as I understood you would like to
use containers with the same way everywhere. I will try to explain this at the
end of this comment.
bq. The docker process adds 32 seconds. It is 26% increase in build time,
Yes, we agree that it's slower.
Without SSD it's even slower.
Yes, I think it's a problem this is the reason why I put it under the "Cons"
list
No, we can't skip docker image creation as the pseudo-cluster creation should
be available for all of the time (as it's available as of now with the current
solution). I think BOTH the unit tests AND the integration tests should be
checked ALL the time.
SUMMARY:
* What we agree: build is significat slower
* What we don't agree: I think we need to keep it simple and fast to execute
the smoketest, ALL the time. This is the supported by the proposed solution.
bq. 2. It's harder to test locally the patches. The reproducibility are
decreased. (Instead of the final build a local container is used).
bq. Not true. Docker can use a local container or use official image by
supplying -DskipDocker flag, and let docker compose yaml file decide to use
local image or official released binary for fault-injection-test maven project.
If you want to apply patch to official released docker image, then you are are
already replacing binaries in docker image, it is no longer the official
released docker image. Therefore, what is wrong with using a local image that
give you exactly same version of everything that is described in Dockerfile of
the official image?
Not exactly, I am talking about a differnt thing. Let's imagine that you start
two build paralell. How do you know which image is used to execute the tests?
You can't be sure. There is no direct relation ship between checked out source
code, docker image which is created and the compose files. (We need build
specific tag names for that which are saved to somewhere).
With mounting the volumes (as we do it now) we can be sure and you can execute
multiple smoketests paralell.
SUMMARY:
* I think we are talking about different things
* I think the paralell executing is borken by the proposed solution
bq. 3. It's harder to test a release package from the smoketest directory.
bq. Smoketest can converted to another submodule of ozone. The same
-DskipDocker can run smoketest with official build without using local image. I
will spend some time on this part of the project to make sure that I didn't
break anything.
Please don't do it. As I wrote I would like to keep the possiblity to execute
the smoketests WITHOUT build. I think this is a very useful feature to
1. Test convenience binary package during the vote
2. Smoketest different install (eg. kubernetes install)
Please check my last vote. I executed the smoketest for both the src package
and the bin package to be sure that both are good.
SUMMARY:
* You think it's enough to execute smoketest during the build
* I think it's very important to make it possible to run smoketests without a
build from any install (convenience binary, kubernetes, on-prem install, etc.)
bq. Security fixes can be applied later to the docker containers.
bq. The correct design is to swap out the docker container completely with a
new version. Patch and upgrade strategy should not be in-place modification of
the docker container that grows over time. Overlay file system will need to be
committed to retain state. By running container with in place binary
replacement can lead to inconsistent state of docker container when power
failure happens.
Again, I am not sure that we are talking about the same problem. I would like
to "swap out" the old docker images completly. And because the first layers are
updated it won't "grow" over time.
But let's make the conversation more clean: let's talk about tags. If I
understood well, in case of a security issue you would like to drop (?) all the
old images (eg. hadoop:2.9.0, hadoop:3.1.0, hadoop:3.2.0, hadoop:3.2.1) and
create a new one (hadoop:3.2.2).
First of all, I don't know how would you like to drop the images (as of now you
need to create an INFRA ticket for that one). But there could be a lot of users
of the old images. What would you do with them? With dropping old images you
would break a lot of users (it's exactly the same as _deleting_ all the old
hadoop releases in case of a security issue. But we don't do it).
SUMMARY:
* AFAIK you would like to support only the latest images
* I have a lot of use cases when I need old images and I would like to give
_limited_ support (eg. update the underlaying op in the container eventually).
bq. 5. It conflicts if more than one builds are executed on the same machine
(docker images are shared but volume mounts are separated). (Even just this one
is a blocker problem for me)
bq. Does this mean you have multiple source tree running build? Why not use
multiple start-build-env.sh on parallel source trees? I think it can provide
the same multi-build process, but need to test it out.
Yes, it means multiple builds.
Yes, I would like to support it even without start-build-env.sh
Yes, I would like to support it on Jenkins (where start-build-env is not used)
No, start-build-env doesn't solve the problem as you added the following lines:
{code}
+ -v "/var/run/docker.sock:/var/run/docker.sock" \
{code}
Which is added by Jenkins anyway so you can't fix it just with removing it from
start-build-env.sh.
SUMMARY:
* I have important use cases which are not supported by the proposed change.
bq. 6. Some tests can't be executed from the final tarball. (I would like to
execute tests from the released binary, as discussed earlier in the original
jira).
bq .Same answer as before use -DskipDocker flag, or use mvn install final
tarball, and run docker build. Maybe I am missing something, please clarify.
I think I already (tried to) clarify it earlier. I would like to execute all
the smoketests (and blockade tests) from the distibution package. It may not be
important for you but please respect my which to support it.
* I would like to test the convenience release binaries during a vote (yes, I
would like to be sure that both the src package (the de facto release) and the
convenience binary package is fine.
* I would like to execute the smoketests in different environment: for example
on kubernetes nodes to be sure that it's instaled well (I do it even now, and
it's very useful).
SUMMARY:
* I have important use cases which are not supported by the proposed change.
bq. I have a hard time with Ozone stable environment. Hadoop-runner is a read
only file system without symlink support, and all binaries and data are mounted
from external volume. What benefit does this provide?
I did my best to enumerate the benefits (see the cons list). Reproducable test
execution, parallel test execution, the ability to execute the tests with
exactly the same bits.
bq. Everything is interacting with outside of the container environment. What
containment does this style of docker image provides? I only see process
namespace isolation, network interface simulation. But it lacks of ability to
make docker container idempotent and reproducible else where.
You are 100% right. Here (for testing) we use only namespace and network
isolation and reproducable runtime environment. I think what you are looking
for is an other use case (what I called (1) option in my previous comment). And
I totally agree that we need to support BOTH.
SUMMARY:
* You would like to use full portable and self-contained container for all the
use cases.
* I can see multiple style of the container usage and while I would like to
support the style what you prefer (see the k8s-dev profile and apache/ozone)
image I also would like to use containers for development and testing with a
different 'style' to get better, faster and more effective development
experience.
bq. This can create problems that code only works in one node but can not
reproduce else where. I often see distributed software being developed on one
laptop, and having trouble to run in cluster of nodes. The root cause is the
mounted volume is shared between containers, and developers often forgot about
this and wrote code that does local IO access. When it goes to QA, nothing
works in distributed nodes. It would be good to prevent ourselves from making
this type of mistakes by not sharing the same mount points between containers
as base line. I know that you may not have this problem with deeper
understanding of distributed file system. However, it is common mistake among
junior system developers. Application programmers may use shared volumes to
exchange data between containers, but not the team that suppose to build
distributed file system. I would like to prevent this type of silliness from
happening by making sure that ozone processes don't exchange data via shared
volumes.
Thanks this comment as it's a very technical concern, I understand it. If I
understood well you say that two component may write eventually the same file.
I think there is a very low chance to do this kind of problem. I have never
seen it until now and our current code structure (we write most of the data
with Metadata interface) it very hard (especially as we have very strong code
review process).
But understand a risk and that's one reason that I would like to support to
execute smoketest on kubernetes. And yes, for kubernetes we need to create the
real containers, exactly what you would like to use (see, again, the k8s-dev
profile).
In fact the local writable area of the components are part of the containers
(/data or /tmp usually) and not mounted.
SUMMARY:
* You have some concerns about the identification problems when we try to
write the same files from multiple components.
* I think the chance to get some problems is low and we didn't see such
problems until now. Adding ":ro" flag to the local mount can solve this problem.
Let's talk about the testing. For testing usually we have multiple
layers(sometimes called it Test Pyramid
https://martinfowler.com/bliki/TestPyramid.html). We may have unit tests,
integration tests, acceptance tests, etc.
I have a very similar view about the container usage. There are multiple layers
of the container usage. I think the first layer is to use containers for the
network/disk layout isolation and the next level is to use totally independent
and portable containers. And I totally agree with you in the benefit of the
second level (portable containers) and I think it's very important to support
that level. In this level we have apache:ozone docker image (totally portable,
self container) and we have a tool to create similar containers for the dev
images (k8s-dev and k8s-dev-push profile).
The big difference in our views that I can see very huge benefit to use
contains on the level one where we use only the network/disk layout isolation.
This is not a replacement of the usage of full containers but an other level of
the usage for make it easy to develop and test (as you can see most of my
comments are related to the development and testing process).
I think it's a very important distinction. And the root cause of the conflict
between our views that you try to find the level2 container usage where we use
only contains on level1 level (because it's more effective).
bq. I haven't seen a project in Dockerhub that use the same technique as option
2. Can you show me some examples of similar projects?
Dockerhub usually used to store level2 images, but there are patterns where the
contains are used just as an environment. For example when a simple go docker
image is used to create an application (and not with multi level docker build.)
> Create a maven profile to run fault injection tests
> ---------------------------------------------------
>
> Key: HDDS-1458
> URL: https://issues.apache.org/jira/browse/HDDS-1458
> Project: Hadoop Distributed Data Store
> Issue Type: Test
> Reporter: Eric Yang
> Assignee: Eric Yang
> Priority: Major
> Attachments: HDDS-1458.001.patch, HDDS-1458.002.patch,
> HDDS-1458.003.patch
>
>
> Some fault injection tests have been written using blockade. It would be
> nice to have ability to start docker compose and exercise the blockade test
> cases against Ozone docker containers, and generate reports. This is
> optional integration tests to catch race conditions and fault tolerance
> defects.
> We can introduce a profile with id: it (short for integration tests). This
> will launch docker compose via maven-exec-plugin and run blockade to simulate
> container failures and timeout.
> Usage command:
> {code}
> mvn clean verify -Pit
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]