goktugkose opened a new issue, #39613: URL: https://github.com/apache/airflow/issues/39613
### Apache Airflow version 2.9.1 ### If "Other Airflow 2 version" selected, which one? _No response_ ### What happened? We are experiencing an issue with Stackdriver Logging with Airflow. As changelog suggests, with Airflow 2.9.1 version Google provider is updated to 10.17.0 and solves the bugs with Stackdriver logging. We tested with the latest version. However, we cannot configure remote logging with Stackdriver. As an initial step, we created a log bucket in Cloud Monitoring and also created a log sink to investigate the logs. We use the configuration below in Helm chart, `GOOGLE_APPLICATION_CREDENTIALS` is set as an environment variable and a Google Cloud connection containing the same service account is added to Airflow with these scopes `https://www.googleapis.com/auth/cloud-platform`, `https://www.googleapis.com/auth/logging.admin` **Findings:** 1. Same service account works fine with GCS Logging. 2. No logs are written in the base_log_folder **Problems faced:** 1. Documentation states that the logs supposed to be shown in real time. However, we are waiting for logs to be loaded to Airflow UI. 2. Shown logs does not contain the application logs that are printed using logging library. (I will share the DAG file in the thread.) **Example DAG File:** ``` from datetime import datetime from airflow import DAG from airflow.operators.python_operator import PythonOperator import logging, time logger = logging.getLogger(__name__) logging.basicConfig(level=logging.INFO) default_args = { 'owner': 'admin', 'depends_on_past': False, 'start_date': datetime(2024, 4, 24), "retries": 0, } dag = DAG('test', default_args=default_args, schedule_interval="*/20 * * * *", catchup=False, max_active_runs=1, tags=["test"]) def func(): steps = 10 logger.info(f"Executing {steps} steps...") for i in range(10): logger.info(f"Step {i+1} executed.") time.sleep(1) logger.info("Successfully finished!") task_1 = PythonOperator( dag = dag, task_id = 'task_1', python_callable = func ) task_1 ``` **Produced Logs (not real time, they appear after task completion):** ``` [2024-05-09, 16:40:04 +03] {local_task_job_runner.py:120} ▼ Pre task execution logs [2024-05-09, 16:40:05 +03] {taskinstance.py:2076} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: test.task_1 scheduled__2024-05-09T13:20:00+00:00 [queued]> [2024-05-09, 16:40:05 +03] {taskinstance.py:2076} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: test.task_1 scheduled__2024-05-09T13:20:00+00:00 [queued]> [2024-05-09, 16:40:05 +03] {taskinstance.py:2306} INFO - Starting attempt 1 of 1 [2024-05-09, 16:40:06 +03] {taskinstance.py:2330} INFO - Executing <Task(PythonOperator): task_1> on 2024-05-09 13:20:00+00:00 [2024-05-09, 16:40:06 +03] {warnings.py:110} WARNING - /home/airflow/.local/lib/python3.12/site-packages/airflow/task/task_runner/standard_task_runner.py:61: DeprecationWarning: This process (pid=865) is multi-threaded, use of fork() may lead to deadlocks in the child. pid = os.fork() [2024-05-09, 16:40:06 +03] {standard_task_runner.py:63} INFO - Started process 882 to run task [2024-05-09, 16:40:31 +03] {local_task_job_runner.py:310} WARNING - State of this instance has been externally set to success. Terminating instance. [2024-05-09, 16:40:31 +03] {local_task_job_runner.py:222} ▲▲▲ Log group end ``` ``` [2024-05-09, 16:40:31 +03] {process_utils.py:132} INFO - Sending 15 to group 882. PIDs of all processes in the group: [882] [2024-05-09, 16:40:31 +03] {process_utils.py:87} INFO - Sending the signal 15 to group 882 [2024-05-09, 16:40:31 +03] {process_utils.py:80} INFO - Process psutil.Process(pid=882, status='terminated', exitcode=0, started='13:40:05') (882) terminated with exit code 0 ``` ### What you think should happen instead? As suggested by Airflow documentation, logs should be loaded in real time. Other logging configurations such as Elasticsearch provides this behavior. Also, task logs should also be served in the UI. I have tested with GCS Logging and we expect to see the similar logs with Stackdriver logging. ``` ``` ### How to reproduce - `GOOGLE_APPLICATION_CREDENTIALS` with roles `roles/logging.admin` and `roles/monitoring.admin` - `Helm 3.12.3` - Celery Executor - `apache-airflow-providers-google==10.17.0` - Airflow 2.9.1 - Helm Chart values: ``` config: logging: remote_logging: 'True' remote_base_log_folder: "stackdriver:///<log-name>" remote_log_conn_id: "<connection-id>" google_key_path: "<path-to-mounted-service-account-file>" ``` ### Operating System Debian GNU/Linux 12 (bookworm) ### Versions of Apache Airflow Providers ``` apache-airflow-providers-amazon | 8.20.0 apache-airflow-providers-celery | 3.6.2 apache-airflow-providers-cncf-kubernetes | 8.1.1 apache-airflow-providers-common-io | 1.3.1 apache-airflow-providers-common-sql | 1.12.0 apache-airflow-providers-docker | 3.10.0 apache-airflow-providers-elasticsearch | 5.3.4 apache-airflow-providers-fab | 1.0.4 apache-airflow-providers-ftp | 3.8.0 apache-airflow-providers-google | 10.17.0 apache-airflow-providers-grpc | 3.4.1 apache-airflow-providers-hashicorp | 3.6.4 apache-airflow-providers-http | 4.10.1 apache-airflow-providers-imap | 3.5.0 apache-airflow-providers-microsoft-azure | 10.0.0 apache-airflow-providers-mysql | 5.5.4 apache-airflow-providers-odbc | 4.5.0 apache-airflow-providers-openlineage | 1.7.0 apache-airflow-providers-postgres | 5.10.2 apache-airflow-providers-redis | 3.6.1 apache-airflow-providers-sendgrid | 3.4.0 apache-airflow-providers-sftp | 4.9.1 apache-airflow-providers-slack | 8.6.2 apache-airflow-providers-smtp | 1.6.1 apache-airflow-providers-snowflake | 5.4.0 apache-airflow-providers-sqlite | 3.7.1 apache-airflow-providers-ssh | 3.10.1 ``` ### Deployment Official Apache Airflow Helm Chart ### Deployment details _No response_ ### Anything else? _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
