goktugkose opened a new issue, #39613:
URL: https://github.com/apache/airflow/issues/39613

   ### Apache Airflow version
   
   2.9.1
   
   ### If "Other Airflow 2 version" selected, which one?
   
   _No response_
   
   ### What happened?
   
   We are experiencing an issue with Stackdriver Logging with Airflow. As 
changelog suggests, with Airflow 2.9.1 version Google provider is updated to 
10.17.0 and solves the bugs with Stackdriver logging. We tested with the latest 
version. However, we cannot configure remote logging with Stackdriver. As an 
initial step, we created a log bucket in Cloud Monitoring and also created a 
log sink to investigate the logs. We use the configuration below in Helm chart, 
`GOOGLE_APPLICATION_CREDENTIALS` is set as an environment variable and a Google 
Cloud connection containing the same service account is added to Airflow with 
these scopes `https://www.googleapis.com/auth/cloud-platform`, 
`https://www.googleapis.com/auth/logging.admin`
   
   **Findings:**
   1. Same service account works fine with GCS Logging.
   2. No logs are written in the base_log_folder
   
   **Problems faced:**
   1. Documentation states that the logs supposed to be shown in real time. 
However, we are waiting for logs to be loaded to Airflow UI.
   2. Shown logs does not contain the application logs that are printed using 
logging library. (I will share the DAG file in the thread.)
   
   **Example DAG File:**
   
   ```
   from datetime import datetime
   from airflow import DAG
   from airflow.operators.python_operator import PythonOperator
   
   import logging, time
   logger = logging.getLogger(__name__)
   logging.basicConfig(level=logging.INFO)
   
   default_args = {
       'owner': 'admin',
       'depends_on_past': False,
       'start_date': datetime(2024, 4, 24),
       "retries": 0,
   }
   
   dag = DAG('test', default_args=default_args, schedule_interval="*/20 * * * 
*", catchup=False, max_active_runs=1, tags=["test"])
   
   def func():
       steps = 10
       logger.info(f"Executing {steps} steps...")
       for i in range(10):
           logger.info(f"Step {i+1} executed.")
           time.sleep(1)
   
       logger.info("Successfully finished!")
   
   task_1 = PythonOperator(
       dag = dag,
       task_id = 'task_1',
       python_callable = func
   )
   
   task_1
   ```
   
   **Produced Logs (not real time, they appear after task completion):**
   
   ```
   [2024-05-09, 16:40:04 +03] {local_task_job_runner.py:120} ▼ Pre task 
execution logs
   [2024-05-09, 16:40:05 +03] {taskinstance.py:2076} INFO - Dependencies all 
met for dep_context=non-requeueable deps ti=<TaskInstance: test.task_1 
scheduled__2024-05-09T13:20:00+00:00 [queued]>
   [2024-05-09, 16:40:05 +03] {taskinstance.py:2076} INFO - Dependencies all 
met for dep_context=requeueable deps ti=<TaskInstance: test.task_1 
scheduled__2024-05-09T13:20:00+00:00 [queued]>
   [2024-05-09, 16:40:05 +03] {taskinstance.py:2306} INFO - Starting attempt 1 
of 1
   [2024-05-09, 16:40:06 +03] {taskinstance.py:2330} INFO - Executing 
<Task(PythonOperator): task_1> on 2024-05-09 13:20:00+00:00
   [2024-05-09, 16:40:06 +03] {warnings.py:110} WARNING - 
/home/airflow/.local/lib/python3.12/site-packages/airflow/task/task_runner/standard_task_runner.py:61:
 DeprecationWarning: This process (pid=865) is multi-threaded, use of fork() 
may lead to deadlocks in the child.
     pid = os.fork()
   
   [2024-05-09, 16:40:06 +03] {standard_task_runner.py:63} INFO - Started 
process 882 to run task
   [2024-05-09, 16:40:31 +03] {local_task_job_runner.py:310} WARNING - State of 
this instance has been externally set to success. Terminating instance.
   [2024-05-09, 16:40:31 +03] {local_task_job_runner.py:222} ▲▲▲ Log group end
   ```
   
   ```
   [2024-05-09, 16:40:31 +03] {process_utils.py:132} INFO - Sending 15 to group 
882. PIDs of all processes in the group: [882]
   [2024-05-09, 16:40:31 +03] {process_utils.py:87} INFO - Sending the signal 
15 to group 882
   [2024-05-09, 16:40:31 +03] {process_utils.py:80} INFO - Process 
psutil.Process(pid=882, status='terminated', exitcode=0, started='13:40:05') 
(882) terminated with exit code 0
   ```
   
   ### What you think should happen instead?
   
   As suggested by Airflow documentation, logs should be loaded in real time. 
Other logging configurations such as Elasticsearch provides this behavior. 
Also, task logs should also be served in the UI. I have tested with GCS Logging 
and we expect to see the similar logs with Stackdriver logging.
   
   ```
   
   ```
   
   ### How to reproduce
   
   - `GOOGLE_APPLICATION_CREDENTIALS` with roles `roles/logging.admin` and 
`roles/monitoring.admin`
   - `Helm 3.12.3`
   - Celery Executor
   - `apache-airflow-providers-google==10.17.0`
   - Airflow 2.9.1
   - Helm Chart values:
   ```
   config:
     logging:
       remote_logging: 'True'
       remote_base_log_folder: "stackdriver:///<log-name>"
       remote_log_conn_id: "<connection-id>"
       google_key_path: "<path-to-mounted-service-account-file>"
   ```
   
   ### Operating System
   
   Debian GNU/Linux 12 (bookworm)
   
   ### Versions of Apache Airflow Providers
   
   ```
   apache-airflow-providers-amazon          | 8.20.0 
   apache-airflow-providers-celery          | 3.6.2  
   apache-airflow-providers-cncf-kubernetes | 8.1.1  
   apache-airflow-providers-common-io       | 1.3.1  
   apache-airflow-providers-common-sql      | 1.12.0 
   apache-airflow-providers-docker          | 3.10.0 
   apache-airflow-providers-elasticsearch   | 5.3.4  
   apache-airflow-providers-fab             | 1.0.4  
   apache-airflow-providers-ftp             | 3.8.0  
   apache-airflow-providers-google          | 10.17.0
   apache-airflow-providers-grpc            | 3.4.1  
   apache-airflow-providers-hashicorp       | 3.6.4  
   apache-airflow-providers-http            | 4.10.1 
   apache-airflow-providers-imap            | 3.5.0  
   apache-airflow-providers-microsoft-azure | 10.0.0 
   apache-airflow-providers-mysql           | 5.5.4  
   apache-airflow-providers-odbc            | 4.5.0  
   apache-airflow-providers-openlineage     | 1.7.0  
   apache-airflow-providers-postgres        | 5.10.2 
   apache-airflow-providers-redis           | 3.6.1  
   apache-airflow-providers-sendgrid        | 3.4.0  
   apache-airflow-providers-sftp            | 4.9.1  
   apache-airflow-providers-slack           | 8.6.2  
   apache-airflow-providers-smtp            | 1.6.1  
   apache-airflow-providers-snowflake       | 5.4.0  
   apache-airflow-providers-sqlite          | 3.7.1  
   apache-airflow-providers-ssh             | 3.10.1
   ```
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to