geocomm-descue opened a new issue, #57356:
URL: https://github.com/apache/airflow/issues/57356

   ### Apache Airflow version
   
   3.1.0
   
   ### If "Other Airflow 2/3 version" selected, which one?
   
   _No response_
   
   ### What happened?
   
   We have been having an issue with logging to Cloudwatch in all of our 
Airflow components except for the DAG Processor. We have a custom Airflow image 
that uses the KubernetesExecutor to provision workers for our Airflow tasks. 
Even tho we are seeing the error with our custom image, we also see the same 
behavior described below in a vanilla deployment of Airflow 3.0.2 and 3.1 
deployed via the Airflow Helm Chart with all defaults unless specified in the 
Helm values file provided below. The tasks are not logging to Cloudwatch. After 
DAG run completion, we get the following error.
   
   ```
   Log message source details sources=["Reading remote log from Cloudwatch 
log_group: /io/tasks log_stream: 
dag_id=diagnostic_dag/run_id=manual__2025-10-21T14:43:30.318780+00:00/task_id=check_accessability_of_io_services/attempt=1.log","An
 error occurred (ResourceNotFoundException) when calling the GetLogEvents 
operation: The specified log stream does not exist.","Could not read served 
logs: 
HTTPConnectionPool(host='diagnostic-dag-check-accessability-of-io-services-484wha5d',
 port=8793): Max retries exceeded with url: 
/log/dag_id=diagnostic_dag/run_id=manual__2025-10-21T14:43:30.318780+00:00/task_id=check_accessability_of_io_services/attempt=1.log
 (Caused by NameResolutionError(\"<urllib3.connection.HTTPConnection object at 
0x7fbbfc64a270>: Failed to resolve 
'diagnostic-dag-check-accessability-of-io-services-484wha5d' ([Errno -2] Name 
or service not known)\"))"]
   ```
   
   While the task is running, we are able to see that the API Server access the 
task logs by falling back to the kube API
   
   ```
   Log message source details sources=["Reading remote log from Cloudwatch 
log_group: /io/tasks log_stream: 
dag_id=diagnostic_dag/run_id=manual__2025-10-23T01:14:57.974439+00:00/task_id=trigger_build_virtual_environment/attempt=1.log","An
 error occurred (ResourceNotFoundException) when calling the GetLogEvents 
operation: The specified log stream does not exist.","Attempting to fetch logs 
from pod diagnostic-dag-trigger-build-virtual-environment-cy19vpom through kube 
API","Found logs through kube API"]
   ```
   
   Cloudwatch is still correctly getting the logs for the pod, but it seems the 
pod is not creating the specified log stream, preventing the API Server from 
being able to display the logs. In the pod logs we do see a warning from 
Watchtower.
   
   ```
   
{"timestamp":"2025-10-23T18:59:57.889967Z","level":"warning","event":"Received 
message after logging system 
shutdown","category":"WatchtowerWarning","filename":"/home/airflow/.local/lib/python3.12/site-packages/watchtower/__init__.py","lineno":464,"logger":"py.warnings"}
   ```
   
   As mentioned, this error was encountered with our custom 3.1 image, as well 
as the community maintained 3.0.2 and 3.1 images deployed via the Official 
Airflow Helm Chart. 
   
   ### What you think should happen instead?
   
   I would expect all airflow components to correctly create the log streams in 
Cloudwatch so that the API Server can fetch and render the task logs.
   
   ### How to reproduce
   
   Run a dag using the deployment details below. You should see the task logs 
while the dag is running/the pod is alive. Then the API Server will stop 
showing logs once the pod is killed.
   
   ### Operating System
   
   Ubuntu 22.01
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   Below is the Helm values file we submitted to the Official Airflow Helm 
Chart to reproduce the logging issues mentioned. 
   
   ```yaml
   config:
     logging:
       remote_logging: "True"
       remote_base_log_folder: 
"cloudwatch://arn:aws:logs:us-east-1:<account-id>:log-group:airflowtest"
       remote_log_conn_id: "aws_default"
       logging_level: "DEBUG"
   env:
     - name: "AIRFLOW_CONN_AWS_DEFAULT"
       value: "aws://"
   redis:
     enabled: false
   executor: "KubernetesExecutor"
   postgresql:
     enabled: false
   data:
     metadataConnection:
       user: "postgres"
       pass: "<password>"
       host: "<instance-id>.us-east-1.rds.amazonaws.com"
       sslmode: "require"
   statsd:
     enabled: false
   triggerer:
     persistence:
       enabled: false
   extraConfigMaps:
     'dag-cm':
       data: |
         test-dag.py: |
           from airflow.sdk import DAG
           from airflow.providers.standard.operators.python import 
PythonOperator
           from datetime import datetime
   
   
           with DAG(dag_id="example_dag", start_date=datetime(2020, 1, 1)) as 
dag:
               def print_hello():
                   print("Hello world!")
               def print_goodbye():
                   print("Goodbye world!")
               hello_task = PythonOperator.partial(
                   task_id="hello_task",
               ).expand(python_callable=[print_hello, print_goodbye])
   
               hello_task
   volumes:
     - name: dag-cm
       configMap:
         name: dag-cm
     - name: airflow-data
       persistentVolumeClaim:
         claimName: "airflow-test"
   volumeMounts:
     - name: dag-cm
       mountPath: /opt/airflow/dags/test-dag.py
       subPath: test-dag.py
     - mountPath: /tenant/airflow/logs
       name: airflow-data
   defaultAirflowTag: "3.1.0"
   airflowVersion: "3.1.0"
   ```
   
   
   ### Anything else?
   
   I know there is a closed issue that describes exactly what we are 
experiencing with people still seeming to also have this issue.
   https://github.com/apache/airflow/issues/52501
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to