armourshield opened a new issue, #26620:
URL: https://github.com/apache/airflow/issues/26620

   ### Apache Airflow version
   
   Other Airflow 2 version
   
   ### What happened
   
   I have a setup where airflow is running in **kubernetes (EKS)** and **remote 
worker** running in *docker-compose* in a VM behind a firewall in a different 
location.
   
   **Problem**
   Airflow Web server in EKS is getting 403 forbidden error when trying to get 
logs on remote worker.
   
   **Build Version**
   - Airflow - 2.2.2
   - OS - Linux - Ubuntu 20.04 LTS
   
   **Kubernetes**
    - 1.22 (EKS)
    - Redis (Celery Broker) - Service Port exposed on 6379
    - PostgreSQL (Celery Backend) - Service Port exposed on 5432
   
   **Airflow ENV config setup**
   ```
     AIRFLOW__API__AUTH_BACKEND: airflow.api.auth.backend.basic_auth
     AIRFLOW__CELERY__BROKER_URL: 
redis://<username>:<password>@redis-master.airflow-dev.svc.cluster.local:6379/0
     AIRFLOW__CELERY__RESULT_BACKEND: >-
       
db+postgresql://<username>:<password>@db-postgresql.airflow-dev.svc.cluster.local/<db>
     AIRFLOW__CLI__ENDPOINT_URL: http://{hostname}:8080
     AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
     AIRFLOW__CORE__EXECUTOR: CeleryExecutor
     AIRFLOW__CORE__FERNET_KEY: <fernet_key>
     AIRFLOW__CORE__HOSTNAME_CALLABLE: socket.getfqdn
     AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
     AIRFLOW__CORE__SQL_ALCHEMY_CONN: >-
       
postgresql+psycopg2://<username>:<password>@db-postgresql.airflow-dev.svc.cluster.local/<db>
     AIRFLOW__LOGGING__BASE_LOG_FOLDER: /opt/airflow/logs
     AIRFLOW__LOGGING__WORKER_LOG_SERVER_PORT: '8793'
     AIRFLOW__WEBSERVER__BASE_URL: http://{hostname}:8080
     AIRFLOW__WEBSERVER__SECRET_KEY: <secret_key>
     _AIRFLOW_DB_UPGRADE: 'true'
     _AIRFLOW_WWW_USER_CREATE: 'true'
     _AIRFLOW_WWW_USER_PASSWORD: <username-webserver>
     _AIRFLOW_WWW_USER_USERNAME: <password-webserver>
   ```
   
   **Setup Test**
   
   1. Network reach ability by ping - OK
   2. Celery Broker reach ability for both EKS and remote worker - OK
   3. Celery Backend reach ability for both EKS and remote worker - OK
   4. Firewall Port expose for remote worker Gunicorn API - OK
   5. curl -v telnet://<remote-worker>:8793 test - OK (Connected)
   6. Airflow flower recognizing both workers from Kubernetes and remote worker 
- OK
   7. All the ENV on both webserver, worker (EKS, remote) and scheduler are 
identical
   8. Queue is setup so the DAG runs exactly in that particular worker
   9. Time on both docker, VM and EKS is on UTC. There is a slight 5 to 8 
seconds difference in docker and the pod in EKS
   10. Ran webserver on the remote VM as well which can pick up and show logs
   
   **Description**
   Airflow is able to execute the DAG in remote worker, the logs can be seen in 
the remote worker. I have tried all combinations of setting but still keep 
getting 403.
   
   Another test which was done was just normal curl with webserver auth
   
   This curl was done both from EKS and remote server which hosts 
docker-compose. Results are the same on all the server.
   ```
   curl --user <username-webserver> -vvv 
http:<remote-worker>:8793/logs/?<rest-of-the-log-url>
   Getting 403 Forbidden
   ```
   
   I might have miss configured it, but I doubt that is the case.
   Any tips on what I am missing here? Many thanks in advance.
   
   ### What you think should happen instead
   
   Airflow web-server in EKS should be able to access the remote logs from 
remote worker as the port is accessible and all the secret key, fernet key 
matches and all the env matches.
   
   ### How to reproduce
   
   Use K3S, minikube or kubernetes of your choice. to replicate remote-worker 
run docker-compose setup for the installation.
   
   **Build Version**
   - Airflow - 2.2.2
   - OS - Linux - Ubuntu 20.04 LTS
   
   **Kubernetes**
    - 1.22 EKS, K3S, minikube or kubernetes
    - Redis (Celery Broker) - Service Port exposed on 6379
    - PostgreSQL (Celery Backend) - Service Port exposed on 5432
   
   **Airflow ENV config setup**
   ```
     AIRFLOW__API__AUTH_BACKEND: airflow.api.auth.backend.basic_auth
     AIRFLOW__CELERY__BROKER_URL: 
redis://<username>:<password>@redis-master.airflow-dev.svc.cluster.local:6379/0
     AIRFLOW__CELERY__RESULT_BACKEND: >-
       
db+postgresql://<username>:<password>@db-postgresql.airflow-dev.svc.cluster.local/<db>
     AIRFLOW__CLI__ENDPOINT_URL: http://{hostname}:8080
     AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
     AIRFLOW__CORE__EXECUTOR: CeleryExecutor
     AIRFLOW__CORE__FERNET_KEY: <fernet_key>
     AIRFLOW__CORE__HOSTNAME_CALLABLE: socket.getfqdn
     AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
     AIRFLOW__CORE__SQL_ALCHEMY_CONN: >-
       
postgresql+psycopg2://<username>:<password>@db-postgresql.airflow-dev.svc.cluster.local/<db>
     AIRFLOW__LOGGING__BASE_LOG_FOLDER: /opt/airflow/logs
     AIRFLOW__LOGGING__WORKER_LOG_SERVER_PORT: '8793'
     AIRFLOW__WEBSERVER__BASE_URL: http://{hostname}:8080
     AIRFLOW__WEBSERVER__SECRET_KEY: <secret_key>
     _AIRFLOW_DB_UPGRADE: 'true'
     _AIRFLOW_WWW_USER_CREATE: 'true'
     _AIRFLOW_WWW_USER_PASSWORD: <username-webserver>
     _AIRFLOW_WWW_USER_USERNAME: <password-webserver>
   ```
   
   ### Operating System
   
   Ubuntu 20.04 LTS
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   
   **Airflow ENV config setup**
   ```
     AIRFLOW__API__AUTH_BACKEND: airflow.api.auth.backend.basic_auth
     AIRFLOW__CELERY__BROKER_URL: 
redis://<username>:<password>@redis-master.airflow-dev.svc.cluster.local:6379/0
     AIRFLOW__CELERY__RESULT_BACKEND: >-
       
db+postgresql://<username>:<password>@db-postgresql.airflow-dev.svc.cluster.local/<db>
     AIRFLOW__CLI__ENDPOINT_URL: http://{hostname}:8080
     AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
     AIRFLOW__CORE__EXECUTOR: CeleryExecutor
     AIRFLOW__CORE__FERNET_KEY: <fernet_key>
     AIRFLOW__CORE__HOSTNAME_CALLABLE: socket.getfqdn
     AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
     AIRFLOW__CORE__SQL_ALCHEMY_CONN: >-
       
postgresql+psycopg2://<username>:<password>@db-postgresql.airflow-dev.svc.cluster.local/<db>
     AIRFLOW__LOGGING__BASE_LOG_FOLDER: /opt/airflow/logs
     AIRFLOW__LOGGING__WORKER_LOG_SERVER_PORT: '8793'
     AIRFLOW__WEBSERVER__BASE_URL: http://{hostname}:8080
     AIRFLOW__WEBSERVER__SECRET_KEY: <secret_key>
     _AIRFLOW_DB_UPGRADE: 'true'
     _AIRFLOW_WWW_USER_CREATE: 'true'
     _AIRFLOW_WWW_USER_PASSWORD: <username-webserver>
     _AIRFLOW_WWW_USER_USERNAME: <password-webserver>
   ```
   
   ### Anything else
   
   This has been failing to fetch logs even there is there complete access to 
the port and remote worker. 403 Forbidden keeps saying secret key when it is 
the same across the whole environment.
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to