Steexyz opened a new issue, #60535: URL: https://github.com/apache/airflow/issues/60535
### Apache Airflow Provider(s) google ### Versions of Apache Airflow Providers 19.3.0 ### Apache Airflow version 3.1.6 ### Operating System Debian GNU/Linux 12 (bookworm) ### Deployment Docker-Compose ### Deployment details Deployment was bade using the base docker-compose file. ``` # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY # KIND, either express or implied. See the License for the # specific language governing permissions and limitations # under the License. # # Basic Airflow cluster configuration for CeleryExecutor with Redis and PostgreSQL. # # WARNING: This configuration is for local development. Do not use it in a production deployment. # # This configuration supports basic configuration using environment variables or an .env file # The following variables are supported: # # AIRFLOW_IMAGE_NAME - Docker image name used to run Airflow. # Default: apache/airflow:3.1.6 # AIRFLOW_UID - User ID in Airflow containers # Default: 50000 # AIRFLOW_PROJ_DIR - Base path to which all the files will be volumed. # Default: . # Those configurations are useful mostly in case of standalone testing/running Airflow in test/try-out mode # # _AIRFLOW_WWW_USER_USERNAME - Username for the administrator account (if requested). # Default: airflow # _AIRFLOW_WWW_USER_PASSWORD - Password for the administrator account (if requested). # Default: airflow # _PIP_ADDITIONAL_REQUIREMENTS - Additional PIP requirements to add when starting all containers. # Use this option ONLY for quick checks. Installing requirements at container # startup is done EVERY TIME the service is started. # A better way is to build a custom image or extend the official image # as described in https://airflow.apache.org/docs/docker-stack/build.html. # Default: '' # # Feel free to modify this file to suit your needs. --- x-airflow-common: &airflow-common # In order to add custom dependencies or upgrade provider distributions you can use your extended image. # Comment the image line, place your Dockerfile in the directory where you placed the docker-compose.yaml # and uncomment the "build" line below, Then run `docker-compose build` to build the images. image: custom-airflow-3:3.1.6 # build: . environment: &airflow-common-env AIRFLOW__CORE__EXECUTOR: CeleryExecutor AIRFLOW__CORE__AUTH_MANAGER: airflow.providers.fab.auth_manager.fab_auth_manager.FabAuthManager AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow AIRFLOW__CORE__FERNET_KEY: '8_4t1Ld5YV7sS3k9n-Bv2pXgZc1R0fHwJmNlUa6I7oQ=' AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true' AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0 AIRFLOW__CORE__LOAD_EXAMPLES: 'false' AIRFLOW__CORE__EXECUTION_API_SERVER_URL: 'http://airflow-apiserver:8080/execution/' # yamllint disable rule:line-length # Use simple http server on scheduler for health checks # See https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/check-health.html#scheduler-health-check-server # yamllint enable rule:line-length AIRFLOW__SCHEDULER__ENABLE_HEALTH_CHECK: 'true' # WARNING: Use _PIP_ADDITIONAL_REQUIREMENTS option ONLY for a quick checks # The following line can be used to set a custom config file, stored in the local config folder AIRFLOW_CONFIG: '/opt/airflow/config/airflow.cfg' volumes: - ./dags:/opt/airflow/dags - ./logs:/opt/airflow/logs - ./plugins:/opt/airflow/plugins - ./config:/opt/airflow/config - ${appdata}/gcloud:/home/airflow/.config/gcloud user: "${AIRFLOW_UID:-50000}:0" depends_on: &airflow-common-depends-on redis: condition: service_healthy postgres: condition: service_healthy services: postgres: image: postgres:16 environment: POSTGRES_USER: airflow POSTGRES_PASSWORD: airflow POSTGRES_DB: airflow volumes: - postgres-db-volume:/var/lib/postgresql/data healthcheck: test: ["CMD", "pg_isready", "-U", "airflow"] interval: 10s retries: 5 start_period: 5s restart: always redis: # Redis is limited to 7.2-bookworm due to licencing change # https://redis.io/blog/redis-adopts-dual-source-available-licensing/ image: redis:7.2-bookworm expose: - 6379 healthcheck: test: ["CMD", "redis-cli", "ping"] interval: 10s timeout: 30s retries: 50 start_period: 30s restart: always airflow-apiserver: <<: *airflow-common command: api-server ports: - "8080:8080" healthcheck: test: ["CMD", "curl", "--fail", "http://localhost:8080/api/v2/version"] interval: 30s timeout: 10s retries: 5 start_period: 30s restart: always depends_on: <<: *airflow-common-depends-on airflow-init: condition: service_completed_successfully airflow-scheduler: <<: *airflow-common command: scheduler healthcheck: test: ["CMD", "curl", "--fail", "http://localhost:8974/health"] interval: 30s timeout: 10s retries: 5 start_period: 30s restart: always depends_on: <<: *airflow-common-depends-on airflow-init: condition: service_completed_successfully airflow-dag-processor: <<: *airflow-common command: dag-processor healthcheck: test: ["CMD-SHELL", 'airflow jobs check --job-type DagProcessorJob --hostname "$${HOSTNAME}"'] interval: 30s timeout: 10s retries: 5 start_period: 30s restart: always depends_on: <<: *airflow-common-depends-on airflow-init: condition: service_completed_successfully airflow-triggerer: <<: *airflow-common command: triggerer healthcheck: test: ["CMD-SHELL", 'airflow jobs check --job-type TriggererJob --hostname "$${HOSTNAME}"'] interval: 30s timeout: 10s retries: 5 start_period: 30s restart: always depends_on: <<: *airflow-common-depends-on airflow-init: condition: service_completed_successfully airflow-worker: <<: *airflow-common command: celery worker healthcheck: # yamllint disable rule:line-length test: - "CMD-SHELL" - 'celery --app airflow.providers.celery.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}" || celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"' interval: 30s timeout: 10s retries: 5 start_period: 30s environment: <<: *airflow-common-env # Required to handle warm shutdown of the celery workers properly # See https://airflow.apache.org/docs/docker-stack/entrypoint.html#signal-propagation DUMB_INIT_SETSID: "0" restart: always depends_on: <<: *airflow-common-depends-on airflow-apiserver: condition: service_healthy airflow-init: condition: service_completed_successfully airflow-init: <<: *airflow-common entrypoint: /bin/bash # yamllint disable rule:line-length command: - -c - | if [[ -z "50000" ]]; then echo echo -e "\033[1;33mWARNING!!!: AIRFLOW_UID not set!\e[0m" echo "If you are on Linux, you SHOULD follow the instructions below to set " echo "AIRFLOW_UID environment variable, otherwise files will be owned by root." echo "For other operating systems you can get rid of the warning with manually created .env file:" echo " See: https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#setting-the-right-airflow-user" echo export AIRFLOW_UID=$$(id -u) fi one_meg=1048576 mem_available=$$(($$(getconf _PHYS_PAGES) * $$(getconf PAGE_SIZE) / one_meg)) cpus_available=$$(grep -cE 'cpu[0-9]+' /proc/stat) disk_available=$$(df / | tail -1 | awk '{print $$4}') warning_resources="false" if (( mem_available < 4000 )) ; then echo echo -e "\033[1;33mWARNING!!!: Not enough memory available for Docker.\e[0m" echo "At least 4GB of memory required. You have $$(numfmt --to iec $$((mem_available * one_meg)))" echo warning_resources="true" fi if (( cpus_available < 2 )); then echo echo -e "\033[1;33mWARNING!!!: Not enough CPUS available for Docker.\e[0m" echo "At least 2 CPUs recommended. You have $${cpus_available}" echo warning_resources="true" fi if (( disk_available < one_meg * 10 )); then echo echo -e "\033[1;33mWARNING!!!: Not enough Disk space available for Docker.\e[0m" echo "At least 10 GBs recommended. You have $$(numfmt --to iec $$((disk_available * 1024 )))" echo warning_resources="true" fi if [[ $${warning_resources} == "true" ]]; then echo echo -e "\033[1;33mWARNING!!!: You have not enough resources to run Airflow (see above)!\e[0m" echo "Please follow the instructions to increase amount of resources available:" echo " https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#before-you-begin" echo fi echo echo "Creating missing opt dirs if missing:" echo mkdir -v -p /opt/airflow/{logs,dags,plugins,config} echo echo "Airflow version:" /entrypoint airflow version echo echo "Files in shared volumes:" echo ls -la /opt/airflow/{logs,dags,plugins,config} echo echo "Running airflow config list to create default config file if missing." echo /entrypoint airflow config list >/dev/null echo echo "Files in shared volumes:" echo ls -la /opt/airflow/{logs,dags,plugins,config} echo echo "Change ownership of files in /opt/airflow to 50000:0" echo chown -R "50000:0" /opt/airflow/ echo echo "Change ownership of files in shared volumes to 50000:0" echo chown -v -R "50000:0" /opt/airflow/{logs,dags,plugins,config} echo echo "Files in shared volumes:" echo ls -la /opt/airflow/{logs,dags,plugins,config} # yamllint enable rule:line-length environment: <<: *airflow-common-env _AIRFLOW_DB_MIGRATE: 'true' _AIRFLOW_WWW_USER_CREATE: 'true' _AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow} _AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow} _PIP_ADDITIONAL_REQUIREMENTS: '' user: "0:0" airflow-cli: <<: *airflow-common profiles: - debug environment: <<: *airflow-common-env CONNECTION_CHECK_MAX_COUNT: "0" # Workaround for entrypoint issue. See: https://github.com/apache/airflow/issues/16252 command: - bash - -c - airflow depends_on: <<: *airflow-common-depends-on volumes: postgres-db-volume: ``` A custom image has been made with additionnal requirements ``` apache-airflow-providers-fab apache-airflow-providers-celery connexion[swagger-ui] apache-airflow-providers-jdbc apache-airflow-providers-oracle[common.sql] apache-airflow-providers-snowflake apache-airflow-providers-docker airflow-provider-rabbitmq apache-airflow-providers-google google-cloud-storage pandas zeep xmltodict orjson pyodbc ``` Docker file ```dockerfile FROM apache/airflow:3.1.6-python3.12 ARG AIRFLOW_VERSION=3.1.6 ARG PYTHON_VERSION=3.12 ARG CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt" # Switch to the root user to install system-level dependencies USER root # Install system dependencies, including OpenJDK for JDBC RUN apt-get update && \ apt-get install -y --no-install-recommends curl gnupg telnet openjdk-17-jdk && \ curl https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor > /etc/apt/trusted.gpg.d/microsoft.gpg && \ curl https://packages.microsoft.com/config/debian/12/prod.list > /etc/apt/sources.list.d/mssql-release.list && \ apt-get update && \ ACCEPT_EULA=Y apt-get install -y --no-install-recommends msodbcsql18 mssql-tools18 unixodbc-dev && \ apt-get clean && \ rm -rf /var/lib/apt/lists/* # Set JAVA_HOME environment variable ENV JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64 # Set the container's timezone to UTC ENV TZ=UTC ENV PATH="$PATH:/opt/mssql-tools18/bin" # Allow legacy TLS versions in Java's security configuration RUN sed -i 's/TLSv1, TLSv1.1, //g' /usr/lib/jvm/java-17-openjdk-amd64/conf/security/java.security # Create directory for JDBC drivers and download the driver RUN mkdir -p /opt/airflow/drivers && \ curl -o /opt/airflow/drivers/mssql-jdbc.jar https://repo1.maven.org/maven2/com/microsoft/sqlserver/mssql-jdbc/12.4.2.jre11/mssql-jdbc-12.4.2.jre11.jar && \ curl -o /opt/airflow/drivers/ojdbc8-12.2.0.1.jar https://repo1.maven.org/maven2/com/oracle/database/jdbc/ojdbc8/12.2.0.1/ojdbc8-12.2.0.1.jar COPY --chown=airflow:airflow openedge.jar /opt/airflow/drivers/openedge.jar # Switch back to the non-privileged airflow user USER airflow # Install Python packages COPY requirements.txt . RUN pip install -r requirements.txt \ --constraint "${CONSTRAINT_URL}" ``` ### What happened I am not able to retrieve a Google Cloud connection anymore. ```python from airflow.providers.google.common.hooks.base_google import GoogleBaseHook ``` ```python hook = GoogleBaseHook(gcp_conn_id=self.connection_id) ``` Produce the following error: ``` Fichier "/home/airflow/.local/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py", ligne 1004 dans run Fichier "/home/airflow/.local/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py", ligne 1405 dans _execute_task Fichier "/home/airflow/.local/lib/python3.12/site-packages/airflow/sdk/bases/operator.py", ligne 417 dans wrapper Fichier "/opt/airflow/plugins/operators/extraction.py", ligne 100 dans execute Fichier "/opt/airflow/plugins/operators/extraction.py", ligne 87 dans execute Fichier "/opt/airflow/plugins/connectors/cloud/target/gcs_target_connector.py", ligne 35 dans __enter__ Fichier "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/google/common/hooks/base_google.py", ligne 283 dans __init__ Fichier "/home/airflow/.local/lib/python3.12/site-packages/airflow/sdk/bases/hook.py", ligne 61 dans get_connection Fichier "/home/airflow/.local/lib/python3.12/site-packages/airflow/sdk/definitions/connection.py", ligne 225 dans get Fichier "/home/airflow/.local/lib/python3.12/site-packages/airflow/sdk/execution_time/context.py", ligne 174 dans _get_connection ``` The connection can successfully be retrieved from CLI using the following command: ``` airflow connections list 10 | gcs-bucket-project | google_cloud_platform | None | None | None | None | None | None | False | False | {'project': | google-cloud-platform ``` ### What you think should happen instead I should be able to retrieve the connection like previously in Airflow 2.11.0. I haven't seen any documentation that would prove otherwise but I'm may have not stumbled upon it yet. ### How to reproduce Try to instantiate a hook with a Google Platform connection using Application Default Credential (ADC) ### Anything else This may or may not be an issue and be related to changes in Airflow 3 but my JDBC connections are not suffering from the same issue using airflow.hooks.base.BaseHook. I've also tried using BaseHook to retry the connections informations without success. ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
