ImadYIdrissi opened a new issue #17320:
URL: https://github.com/apache/airflow/issues/17320


   <!--
   
   IMPORTANT!!!
   
   PLEASE CHECK "SIMILAR TO X EXISTING ISSUES" OPTION IF VISIBLE
   NEXT TO "SUBMIT NEW ISSUE" BUTTON!!!
   
   PLEASE CHECK IF THIS ISSUE HAS BEEN REPORTED PREVIOUSLY USING SEARCH!!!
   
   Please complete the next sections or the issue will be closed.
   These questions are the first thing we need to know to understand the 
context.
   
   -->
   
   **Apache Airflow version**: 2.1.0
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**: GCP Compute Engine - 
e2-standard-4 (4 vCPUs, 16 GB memory)
   - **OS** (e.g. from /etc/os-release): Ubuntu 18.04 
   - **Kernel** (e.g. `uname -a`): `Linux lamachine-preprod 5.4.0-1049-gcp 
#53~18.04.1-Ubuntu SMP Thu Jul 15 11:32:10 UTC 2021 x86_64 x86_64 x86_64 
GNU/Linux`
   
   **What happened**:
   When trying to run `$ sudo docker-compose run airflow-init bash`
   
   ```
   Creating be-api_airflow-init_run ... done
   ....................
   ERROR! Maximum number of retries (20) reached.
   
   Last check result:
   $ airflow db check
   Traceback (most recent call last):
     File "/home/airflow/.local/bin/airflow", line 5, in <module>
       from airflow.__main__ import main
     File 
"/home/airflow/.local/lib/python3.6/site-packages/airflow/__init__.py", line 
34, in <module>
       from airflow import settings
     File 
"/home/airflow/.local/lib/python3.6/site-packages/airflow/settings.py", line 
35, in <module>
       from airflow.configuration import AIRFLOW_HOME, WEBSERVER_CONFIG, conf  
# NOQA F401
     File 
"/home/airflow/.local/lib/python3.6/site-packages/airflow/configuration.py", 
line 1115, in <module>
       conf = initialize_config()
     File 
"/home/airflow/.local/lib/python3.6/site-packages/airflow/configuration.py", 
line 836, in initialize_config
       with open(AIRFLOW_CONFIG, 'w') as file:
   PermissionError: [Errno 13] Permission denied: '/home/airflow/airflow.cfg'
   ERROR: 1
   ```
   
   **What you expected to happen**: 
   
   I expected to see a correct initialization of the container with the proper 
file permissions for the specified UID in the `docker-compose.yml` file, with 
an output that resembles this :
   
   ```
   Creating be-api_airflow-init_run ... done
   BACKEND=postgresql+psycopg2
   DB_HOST=postgres
   DB_PORT=5432
   
   DB: postgresql+psycopg2://airflow:***@postgres/airflow
   [2021-07-29 16:25:03,687] {db.py:695} INFO - Creating tables
   INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
   INFO  [alembic.runtime.migration] Will assume transactional DDL.
   Upgrades done
   airflow already exist in the db
   airflow@7a15c956e187:/opt/airflow$ cd /home/airflow/
   airflow@7a15c956e187:~$
   ```
   
   P.S : This output is achieved by using the `UID=50000` within the `.env` 
file that is attached to the  `docker-compose.yml` file.
   
   <!-- What do you think went wrong? -->
   When using a different UID (i.e 1001 in my case), in order to match the file 
permissions for the `./dags`, `./logs`, `./plugins`, the error occurs. I think 
the `UID=50000` was enforced at some point in the `DockerFile` of the Airflow 
image, and is not correctly substituted when `docker-compose.yml` tries to 
change this value, so the `/home/airflow` files are still created with owner as 
`UID:50000` while the sub-directories `./dags`, `./logs`, `./plugins` will have 
the UID/GID of the host system.
    
   There are 2 major issues with the approach of using a fixed UID:
   
   1. If I have to create and use a single `UID=50000` that will handle all 
airflow operations, then my airflow file system within the host cannot be 
operated properly with different users, e.g. devs when pulling new changes from 
git...
   2. Even if this works properly and we can use another UID than 50000, it 
still restricts the actions to a singular user, that is binded with the `GID=0` 
(This is a requirement from airflow). The result is that we will have the same 
limitation as mentionned earlier, i.e. only 1 UID will be able to change the 
host file system. (Maybe I need to create a separate issue for this)
   
   **How to reproduce it**:
   Create a project with the following structure 
   ```
   custom-project
    ┣ src
    ┃ ┣ dags
    ┃ ┃ ┗ hello_geeks.py
    ┃ ┣ logs
    ┃ ┗ plugins
    ┣ .env
    ┣ README.md
    ┗ docker-compose.yml
   ```
   Use the following files with the following command `sudo docker-compose run 
airflow-init bash`
   
   docker-compose.yml file :
   
   ```yaml
   # Licensed to the Apache Software Foundation (ASF) under one
   # or more contributor license agreements.  See the NOTICE file
   # distributed with this work for additional information
   # regarding copyright ownership.  The ASF licenses this file
   # to you under the Apache License, Version 2.0 (the
   # "License"); you may not use this file except in compliance
   # with the License.  You may obtain a copy of the License at
   #
   #   http://www.apache.org/licenses/LICENSE-2.0
   #
   # Unless required by applicable law or agreed to in writing,
   # software distributed under the License is distributed on an
   # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   # KIND, either express or implied.  See the License for the
   # specific language governing permissions and limitations
   # under the License.
   #
   
   # Basic Airflow cluster configuration for CeleryExecutor with Redis and 
PostgreSQL.
   #
   # WARNING: This configuration is for local development. Do not use it in a 
production deployment.
   #
   # This configuration supports basic configuration using environment 
variables or an .env file
   # The following variables are supported:
   #
   # AIRFLOW_IMAGE_NAME           - Docker image name used to run Airflow.
   #                                Default: apache/airflow:|version|
   # AIRFLOW_UID                  - User ID in Airflow containers
   #                                Default: 50000
   # AIRFLOW_GID                  - Group ID in Airflow containers
   #                                Default: 50000
   #
   # Those configurations are useful mostly in case of standalone 
testing/running Airflow in test/try-out mode
   #
   # _AIRFLOW_WWW_USER_USERNAME   - Username for the administrator account (if 
requested).
   #                                Default: airflow
   # _AIRFLOW_WWW_USER_PASSWORD   - Password for the administrator account (if 
requested).
   #                                Default: airflow
   # _PIP_ADDITIONAL_REQUIREMENTS - Additional PIP requirements to add when 
starting all containers.
   #                                Default: ''
   #
   # Feel free to modify this file to suit your needs.
   ---
       version: '3'
       x-airflow-common:
         &airflow-common
         image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.1.0}
         environment:
           &airflow-common-env
           AIRFLOW__CORE__EXECUTOR: CeleryExecutor
           AIRFLOW__CORE__SQL_ALCHEMY_CONN: 
postgresql+psycopg2://airflow:airflow@postgres/airflow
           AIRFLOW__CELERY__RESULT_BACKEND: 
db+postgresql://airflow:airflow@postgres/airflow
           AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
           AIRFLOW__CORE__FERNET_KEY: ''
           AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
           AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
           AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
           AIRFLOW_HOME: '${AIRFLOW_HOME:-/opt/airflow}'
           _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
         volumes:
           - ./src/dags:${AIRFLOW_HOME:-/opt/airflow}/dags
           - ./src/logs:${AIRFLOW_HOME:-/opt/airflow}/logs
           - ./src/plugins:${AIRFLOW_HOME:-/opt/airflow}/plugins
         user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-50000}"
         depends_on:
           &airflow-common-depends-on
           redis:
             condition: service_healthy
           postgres:
             condition: service_healthy
       
       services:
         postgres:
           image: postgres:13
           environment:
             POSTGRES_USER: airflow
             POSTGRES_PASSWORD: airflow
             POSTGRES_DB: airflow
           volumes:
             - postgres-db-volume:/var/lib/postgresql/data
           healthcheck:
             test: ["CMD", "pg_isready", "-U", "airflow"]
             interval: 5s
             retries: 5
           restart: always
       
         redis:
           image: redis:latest
           expose:
             - 6379
           healthcheck:
             test: ["CMD", "redis-cli", "ping"]
             interval: 5s
             timeout: 30s
             retries: 50
           restart: always
       
         airflow-webserver:
           <<: *airflow-common
           command: webserver
           ports:
             - 9999:8080
           healthcheck:
             test: ["CMD", "curl", "--fail", "http://localhost:8080/health";]
             interval: 10s
             timeout: 10s
             retries: 5
           restart: always
           depends_on:
             <<: *airflow-common-depends-on
             airflow-init:
               condition: service_completed_successfully
       
         airflow-scheduler:
           <<: *airflow-common
           command: scheduler
           healthcheck:
             test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob 
--hostname "$${HOSTNAME}"']
             interval: 10s
             timeout: 10s
             retries: 5
           restart: always
           depends_on:
             <<: *airflow-common-depends-on
             airflow-init:
               condition: service_completed_successfully
       
         airflow-worker:
           <<: *airflow-common
           command: celery worker
           healthcheck:
             test:
               - "CMD-SHELL"
               - 'celery --app airflow.executors.celery_executor.app inspect 
ping -d "celery@$${HOSTNAME}"'
             interval: 10s
             timeout: 10s
             retries: 5
           restart: always
           depends_on:
             <<: *airflow-common-depends-on
             airflow-init:
               condition: service_completed_successfully
       
         airflow-init:
           <<: *airflow-common
           command: version
           environment:
             <<: *airflow-common-env
             _AIRFLOW_DB_UPGRADE: 'true'
             _AIRFLOW_WWW_USER_CREATE: 'true'
             _AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
             _AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
       
         airflow-cli:
           <<: *airflow-common
           profiles:
             - debug
           environment:
             <<: *airflow-common-env
             CONNECTION_CHECK_MAX_COUNT: "0"
           # Workaround for entrypoint issue. See: 
https://github.com/apache/airflow/issues/16252
           command:
             - bash
             - -c
             - airflow
       
         flower:
           <<: *airflow-common
           command: celery flower
           ports:
             - 5555:5555
           healthcheck:
             test: ["CMD", "curl", "--fail", "http://localhost:5555/";]
             interval: 10s
             timeout: 10s
             retries: 5
           restart: always
           depends_on:
             <<: *airflow-common-depends-on
             airflow-init:
               condition: service_completed_successfully
       
       volumes:
         postgres-db-volume:
   
   ```
   
   .env file :
   ```
   AIRFLOW_UID=1001
   AIRFLOW_GID=0
   AIRFLOW_HOME=/home/airflow
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to