shivshav opened a new issue, #38479:
URL: https://github.com/apache/airflow/issues/38479

   ### Apache Airflow version
   
   2.8.3
   
   ### If "Other Airflow 2 version" selected, which one?
   
   _No response_
   
   ### What happened?
   
   We would like to setup airflow to output task logs to `stdout` along with 
its usual mechanism of logging task logs to a file which then get pushed to 
some remote logs destination. Setting up a custom log configuration to do this 
with a custom handler does not work **specifically** when using the 
`CeleryExecutor`. Instead, no logs appear at all on stdout and we only get the 
logs normally generated in the usual task log files the `task` log handler 
writes to.
   
   This configuration _does_ work if using the `LocalExecutor` (haven't tried 
other executors to see if this is a problem with more than just the 
`CeleryExecutor` fwiw as we only use the `LocalExecutor` in local development 
and the `CeleryExecutor` in our deployed environments.
   
   ### What you think should happen instead?
   
   Based on the information noted 
[here](https://airflow.apache.org/docs/apache-airflow/2.8.3/administration-and-deployment/logging-monitoring/advanced-logging-configuration.html)
 and our setup, I would've expected logs to appear in both the task log files 
and on `stdout` so our usual log collectors can collect/ship them as normal.
   
   ### How to reproduce
   
   Docker Compose set up (note, we normally utilize our own custom airflow 
images based on the official ones, but I was able to repro with the official 
ones so the images for the airflow components point to that just to remove any 
extra moving parts)
   
   ```yaml
   version: '2.4'
   
   ## Shared YAML anchors for configuration
   ## Note: top-level keys prefixed with 'x-' are ignored by docker-compose for 
parsing, hence the naming
   # Common config for postgres connection
   x-pg-envs: &pg-envs
     POSTGRES_USER: airflow
     POSTGRES_PASSWORD: airflow
     POSTGRES_DB: airflow
     PGUSER: airflow
   
   # Common configuration for airflow containers shared as a YAML anchor
   x-airflow-app: &airflow-app
     image: apache/airflow:2.8.3-python3.11
     build:
       context: .
     restart: always
     env_file:
       - .env
     environment:
       <<: *pg-envs
       _AIRFLOW_WWW_USER_CREATE: 'true'
       _AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
       _AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
     depends_on: 
       airflow_postgres:
         condition: service_healthy
       redis:
         condition: service_healthy
     volumes:
       - airflow_logs:/opt/airflow/logs
       - ./config/airflow.cfg.dev:/opt/airflow/airflow.cfg
       - ./config/local:/opt/airflow/config
       - ./test-dags:/opt/airflow/dags/repo
   
   services:
     airflow_postgres:
       image: postgres:16
       environment:
         <<: *pg-envs
       healthcheck:
         test: ["CMD-SHELL", "pg_isready -U airflow -d airflow"]
         interval: 1s
         timeout: 5s
         retries: 10
       ports:
         - "5435:5432"
       volumes:
         - airflow_local_postgres:/var/lib/postgresql/data 
   
     redis:
       image: redis:6
       healthcheck:
         test: ["CMD", "redis-cli", "ping"]
         interval: 2s
         retries: 5
         start_period: 3s
       volumes:
       - redis_data:/data
   
     webserver:
       <<: *airflow-app
       command: ["webserver"]
       ports:
         - "8080:8080"
       healthcheck:
         test: ["CMD-SHELL", "[ -f /opt/airflow/airflow-webserver.pid ]"]
         interval: 30s
         timeout: 30s
         retries: 3
   
     scheduler:
       <<: *airflow-app
       command: ["scheduler"]
   
     # The worker and flower services aren't relevant for the LocalExecutor 
setup, just the CeleryExecutor setup
     worker:
       <<: *airflow-app
       command: ["celery", "worker"]
       healthcheck:
         test: ["CMD-SHELL", "[-f /opt/airflow/airflow-worker.pid"]
         interval: 30s
         timeout: 30s
         retries: 3
     flower:
       <<: *airflow-app
       command: ["celery", "flower"]
       healthcheck:
         test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-flower.pid ]"]
         interval: 30s
         timeout: 30s
         retries: 3
       ports:
       - "5555:5555"
   
     migrate_db:
       <<: *airflow-app
       command: ["db", "init"]
       restart: on-failure
   
   volumes:
     airflow_local_postgres:
     airflow_logs:
     redis_data:
   ```
   
   
   Custom log configuration, located in `config/local` and mounted under 
`/opt/airflow/config` in the above docker-compose.yaml
   
   ```python
   from copy import deepcopy
   import sys
   
   from airflow.config_templates.airflow_local_settings import 
DEFAULT_LOGGING_CONFIG
   
   
   # code taken from 
"https://github.com/apache/airflow/discussions/29920#discussioncomment-5208504";
   
   LOGGING_CONFIG = deepcopy(DEFAULT_LOGGING_CONFIG)
   LOGGING_CONFIG["handlers"]["stdout"] = {
       "class": "logging.StreamHandler",
       "formatter": "airflow",
       "stream": sys.stdout, # I have also tried `"ext://sys.stdout"` (no 
difference) and `"sys.stdout"` (crashes the task when using CeleryExecutor)
       "level": "INFO"
   }
   LOGGING_CONFIG["loggers"]["airflow.task"]["handlers"] = ["stdout", "task"]
   ```
   
   Custom minimally reproducible DAG example (located at 
`./test-dags/test-dag.py` and mounted as `./test-dags:/opt/airflow/dags/repo` 
in the above docker-compose configuration)
   
   ```python
   import datetime
   import logging
   
   from airflow import DAG
   from airflow.decorators import task
   
   
   logger = logging.getLogger(__name__)
   
   # I also tried using the old non-task flow API with no difference in case 
that was a problem
   @task
   def log_me(msg):
       logger.warn(msg)
       logger.info(msg)
   
   
   with DAG(
       dag_id="test_dag",
       start_date=datetime.datetime(2024, 3, 19),
       schedule="@daily"
   ):
       log_me("hiiiiii")
   ```
   
   And finally, our custom airflow config with settings trimmed down to what I 
believe are the relevant settings (located at `./config/airflow.cfg.dev` and 
mounted as `/opt/airflow/airflow.cfg` in the provided docker-compose 
configuration above
   
   ```ini
   [core]
   ...
   
   # The executor class that airflow should use. Choices include
   # SequentialExecutor, LocalExecutor, CeleryExecutor, DaskExecutor
   executor = CeleryExecutor  # If I change this to LocalExecutor, logs appear 
on stdout as expected
   
   ...
   
   [logging]
   # Logging class
   # Specify the class that will specify the logging configuration
   # This class has to be on the python classpath
   # logging_config_class = my.path.default_local_settings.LOGGING_CONFIG
   logging_config_class = log_config.LOGGING_CONFIG
   
   # Logging level
   logging_level = INFO
   celery_logging_level = INFO
   fab_logging_level = WARN
   ...
   ```
   
   ### Operating System
   
   OSX
   
   ### Versions of Apache Airflow Providers
   
   All from using the default airflow image
   ```
   apache-airflow-providers-amazon==8.16.0
   apache-airflow-providers-apache-hive==6.4.1
   apache-airflow-providers-celery==3.5.1
   apache-airflow-providers-cncf-kubernetes==7.13.0
   apache-airflow-providers-common-io==1.2.0
   apache-airflow-providers-common-sql==1.10.0
   apache-airflow-providers-docker==3.9.1
   apache-airflow-providers-elasticsearch==5.3.1
   apache-airflow-providers-ftp==3.7.0
   apache-airflow-providers-google==10.13.1
   apache-airflow-providers-grpc==3.4.1
   apache-airflow-providers-hashicorp==3.6.1
   apache-airflow-providers-http==4.8.0
   apache-airflow-providers-imap==3.5.0
   apache-airflow-providers-jdbc==4.2.1
   apache-airflow-providers-microsoft-azure==8.5.1
   apache-airflow-providers-mysql==5.5.1
   apache-airflow-providers-odbc==4.4.0
   apache-airflow-providers-openlineage==1.4.0
   apache-airflow-providers-postgres==5.10.0
   apache-airflow-providers-redis==3.6.0
   apache-airflow-providers-salesforce==5.6.1
   apache-airflow-providers-sendgrid==3.4.0
   apache-airflow-providers-sftp==4.8.1
   apache-airflow-providers-slack==8.5.1
   apache-airflow-providers-snowflake==5.2.1
   apache-airflow-providers-sqlite==3.7.0
   apache-airflow-providers-ssh==3.10.0
   ```
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   Docker Compose version v2.24.6-desktop.1
   
   
   ### Anything else?
   
   A curious thing that may or may not be relevant to what's going on. Most 
examples I've seen utilize the string `"sys.stdout"` as the input for `stream` 
when defining a handler. However, using the string causes the celery worker to 
exit with some stack traces around a string not being "writable". This was 
surprising to me, and could maybe point us in the appropriate direction for 
what's going wrong potentially? Or it's a complete red herring and I apologize 
😅 
   
   Here's a sample of the output I'd expect to see when attached to the 
containers with docker-compose (this was captured from using the 
`LocalExecutor` to confirm I wasn't crazy initially)
   <img width="1010" alt="Screenshot 2024-03-25 at 7 44 00 PM" 
src="https://github.com/apache/airflow/assets/6853278/b9fb3cb3-ee65-4851-a58c-70f3376c8c57";>
   
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to