walter9388 opened a new issue, #45554:
URL: https://github.com/apache/airflow/issues/45554

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### If "Other Airflow 2 version" selected, which one?
   
   2.10.1
   
   ### What happened?
   
   When using CloudWatch logging there seems to be a 60 second time delay 
between the logging output updating. Please see the video below and observe:
   1. Initially Airflow can't find the remote logs (as there are none).
   2. Airflow detects local logs.
   3. Nothing happens in the UI for 60 seconds.
   4. After 60 seconds logging appears, and the top of the printout states it 
is from CloudWatch logs.
   5. It then periodically updates the logs every 60 seconds after until the 
task is completed.
   
   _Please skip ahead in the video as most of it is static!_
   
   
https://github.com/user-attachments/assets/d3abd82b-793a-475a-9488-746d640573c7
   
   As a second minor point, you can also see that grouping now longer works 
with the logs read from CloudWatch. However, this doesn't concern me as much.
   
   ### What you think should happen instead?
   
   I'm not sure if I have configured something incorrectly, but I expected the 
same behaviour as local logging, i.e. tailing of the log file.
   
   I struggled to find the default behaviour documented, but what I expect to 
happen was that Airflow would use the local logs if they were available and 
only use the remote logs if no local logs were found.
   I found this logic in previous [documentation 
(<2.0)](https://airflow.apache.org/docs/apache-airflow/1.10.8/howto/write-logs.html),
 although this may now be outdated:
   > In the Airflow Web UI, remote logs take precedence over local logs when 
remote logging is enabled. If remote logs can not be found or accessed, local 
logs will be displayed. Note that logs are only sent to remote storage once a 
task is complete (including failure); In other words, remote logs for running 
tasks are unavailable (but local logs are available).
   
   Can you confirm that this is the expected behaviour and what is in the video 
above is a bug?
   
   Alternatively, I can see in the browser that a request is made every second 
to update the logs, and I can confirm that the logs are only being written to 
CloudWatch every 60 seconds or when the task is complete. 
   Is this expected behaviour? or should logs be written to cloudwatch at a 
higher rate?
   
   If the behaviour in the video is actually what is expected, I would like to 
suggest one of the following options as we need a <60 second refresh window in 
our logging setup:
   1. A configuration variable to use local logging first if available (e.g. 
`local_logging_prefer = True`).
   2. A configuration variable for the update frequency of the logging when 
using remote logging (e.g. `remote_logging_refresh_period = 60`).
   
   Let me know your thoughts.
   
   ### How to reproduce
   
   The remote logging config was copied from 
[here](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/logging/cloud-watch-task-handlers.html):
   ```
   [logging]
   # Airflow can store logs remotely in AWS Cloudwatch. Users must supply a log 
group
   # ARN (starting with 'cloudwatch://...') and an Airflow connection
   # id that provides write and read access to the log location.
   remote_logging = True
   remote_base_log_folder = cloudwatch://arn:aws:logs:<region name>:<account 
id>:log-group:<group name>
   remote_log_conn_id = MyCloudwatchConn
   ```
   
   The demo DAG used in the video above prints to logging every 10 seconds and 
is as follows:
   ```python
   import logging
   from datetime import datetime
   from time import sleep
   
   from airflow.models import DAG
   from airflow.operators.python import task
   
   with DAG(
       dag_id="dev__cloudwatch_logging_testing",
       start_date=datetime(2024, 1, 1),
       schedule=None,
   ):
   
       @task
       def task1():
           sleeptime = 10
           for i in range(0, 300, sleeptime):
               logging.info(i)
               sleep(sleeptime)
   
       task1()
   ```
   
   ### Operating System
   
   NAME="Ubuntu" VERSION="20.04.6 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian 
PRETTY_NAME="Ubuntu 20.04.6 LTS" VERSION_ID="20.04" 
HOME_URL="https://www.ubuntu.com/"; SUPPORT_URL="https://help.ubuntu.com/"; 
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"; 
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy";
 VERSION_CODENAME=focal UBUNTU_CODENAME=focal
   
   ### Versions of Apache Airflow Providers
   
   ```
   apache-airflow==2.10.1
   apache-airflow-providers-amazon==8.28.0
   apache-airflow-providers-celery==3.8.1
   apache-airflow-providers-cncf-kubernetes==8.4.1
   apache-airflow-providers-common-compat==1.2.0
   apache-airflow-providers-common-io==1.4.0
   apache-airflow-providers-common-sql==1.16.0
   apache-airflow-providers-docker==3.13.0
   apache-airflow-providers-elasticsearch==5.5.0
   apache-airflow-providers-fab==1.3.0
   apache-airflow-providers-ftp==3.11.0
   apache-airflow-providers-google==10.22.0
   apache-airflow-providers-grpc==3.6.0
   apache-airflow-providers-hashicorp==3.8.0
   apache-airflow-providers-http==4.13.0
   apache-airflow-providers-imap==3.7.0
   apache-airflow-providers-microsoft-azure==10.4.0
   apache-airflow-providers-mysql==5.7.0
   apache-airflow-providers-odbc==4.7.0
   apache-airflow-providers-openlineage==1.11.0
   apache-airflow-providers-postgres==5.12.0
   apache-airflow-providers-redis==3.8.0
   apache-airflow-providers-sendgrid==3.6.0
   apache-airflow-providers-sftp==4.11.0
   apache-airflow-providers-slack==8.9.0
   apache-airflow-providers-smtp==1.8.0
   apache-airflow-providers-snowflake==5.7.0
   apache-airflow-providers-sqlite==3.9.0
   apache-airflow-providers-ssh==3.13.1
   ```
   
   ### Deployment
   
   Other Docker-based deployment
   
   ### Deployment details
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to