potiuk commented on issue #4938: [AIRFLOW-4117] Multi-staging Image - Travis CI tests [Step 3/3] URL: https://github.com/apache/airflow/pull/4938#issuecomment-507790924 The case of root user: Explained above (and also in a few resolved comments above): "For one in the new environment everything runs as root. This is another fix (along the way) because of the way local sources are mounted on Linux when you try to run tests locally. Current way when sources are mounted and airflow user is used, works on Mac but when you try to run it on Linux with docker it breaks in cases of generated npm code etc. In the new environment starting docker with mounted local sources works also on Linux ." Longer explanation: The goal of the CI image (and upcoming breeze environment) was to make easily reproducible and manageable TravisCI-like environment using docker-compose. Ideally when you run the docker compose locally with -v <your sources>:/opt/airflow you should get the same experience as when you run Travis CI, but with locally modified code mounted from the host. It was how run_tests and docker-compose was implemented in the original environment. And it's great developer experience. You use MacOS @ashb in your development experience, so you got two things with it: * shared volumes are slow (try npm build in docker vs. npm in host and you will see the difference) * you have no problems with ownership of files for mapped volumes because it is handled by the osxfs - the changes of ownership do not propagate to MacOS (see below) People using Linux have different shared volumes experience (been there, done that): * shared volumes are super-fast (pretty much native speed) * by default users are not mapped so you have the same permissions and user/group ownership as in the host environment (UID/GID) are the same in host and in Docker container. Basically UIDS/GIDS are the same in Host and in Container. Re-mapping users Host-Docker requires some daemon-level modifications and it's system-wide rather than per-docker-run (see below - I provided some sources). The latter means that if Airflow runs inside Docker container as "airflow" user (UID 501 - I think), it has no access to the mounted volumes - unless the permissions for all airflow sources are set to o+rw(x). The effect you get is that the files get mounted to the Docker container but then "airflow" user has no access to it. I tested it on plain ubuntu desktop and it works like that in the old environment. It does not work like that in Travis/old CI environment because there is this line in the old `run_ci.sh` script as the first thing that happens when you enter the environment: ``` sudo chown -R airflow.airflow . $HOME/.cache $HOME/.wheelhouse/ $HOME/.cache/pip ``` And that's fine for using the same docker-compose in Mac because changing ownership does not propagate to the Desktop user of Mac. But it propagates (or actually is simply natively changed) to the Linux Desktop user. This means that after running `run_ci.sh` script in Docker container you end up with all files having user "501:501" on the host -(I believe) - because this is the airflow User ID in Docker. If you are Lucky(TM) Linux Developer, and you have the same user id 501 in the Host - nothing changes for you. But it is really distro-dependent and not guaranteed in any way - so at the end a lot of people will have with their airflow sources owned by another, or even non-existing user (if they try to use docker-compose environment) - just after entering the docker-compose environment. This is hardly nice development experience. If we want to reach the Developer base that have Linux desktops (I bet, we want), and give them environment that easily reproduces the Travis CI one - then we have to make sure it works seamlessly for any developer having Linux workstation not only Macs. So far the only way I found it works (in a few companies across 2 years) was to make the applications in Docker run as root. Root will have access to all files no matter who owns them and will be able to create new files (for example .pyc) as needed. The side effect is that files created in container in airflow sources are owned by root user also in the host - this is another thing that we will have to deal with - but in the breeze environment I solved it by simply cleaning up generated files in docker and there is an easy way to delete them when you need (usually when you need to switch branches and some directories change). This is one of the features of Breeze to help with that case, that otherwise is difficult to even understand if you are not aware of Docker internals. Some additional sources: https://docs.docker.com/docker-for-mac/osxfs/. This is how file sharing works on Mac. You can read how permissions and ownership sharing works on Mac. Here is also the discussion on how you can achieve user mapping on Linux - it requires daemon modifications for docker or changing the ownership after you enter the environment (but then when the ownership is changed you change it also for host in Linux - so it's not really good for Desktop case and seamless sharing): https://github.com/moby/moby/issues/22258 I hope it explains it in detail :)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
