geocomm-descue opened a new issue, #57356:
URL: https://github.com/apache/airflow/issues/57356
### Apache Airflow version
3.1.0
### If "Other Airflow 2/3 version" selected, which one?
_No response_
### What happened?
We have been having an issue with logging to Cloudwatch in all of our
Airflow components except for the DAG Processor. We have a custom Airflow image
that uses the KubernetesExecutor to provision workers for our Airflow tasks.
Even tho we are seeing the error with our custom image, we also see the same
behavior described below in a vanilla deployment of Airflow 3.0.2 and 3.1
deployed via the Airflow Helm Chart with all defaults unless specified in the
Helm values file provided below. The tasks are not logging to Cloudwatch. After
DAG run completion, we get the following error.
```
Log message source details sources=["Reading remote log from Cloudwatch
log_group: /io/tasks log_stream:
dag_id=diagnostic_dag/run_id=manual__2025-10-21T14:43:30.318780+00:00/task_id=check_accessability_of_io_services/attempt=1.log","An
error occurred (ResourceNotFoundException) when calling the GetLogEvents
operation: The specified log stream does not exist.","Could not read served
logs:
HTTPConnectionPool(host='diagnostic-dag-check-accessability-of-io-services-484wha5d',
port=8793): Max retries exceeded with url:
/log/dag_id=diagnostic_dag/run_id=manual__2025-10-21T14:43:30.318780+00:00/task_id=check_accessability_of_io_services/attempt=1.log
(Caused by NameResolutionError(\"<urllib3.connection.HTTPConnection object at
0x7fbbfc64a270>: Failed to resolve
'diagnostic-dag-check-accessability-of-io-services-484wha5d' ([Errno -2] Name
or service not known)\"))"]
```
While the task is running, we are able to see that the API Server access the
task logs by falling back to the kube API
```
Log message source details sources=["Reading remote log from Cloudwatch
log_group: /io/tasks log_stream:
dag_id=diagnostic_dag/run_id=manual__2025-10-23T01:14:57.974439+00:00/task_id=trigger_build_virtual_environment/attempt=1.log","An
error occurred (ResourceNotFoundException) when calling the GetLogEvents
operation: The specified log stream does not exist.","Attempting to fetch logs
from pod diagnostic-dag-trigger-build-virtual-environment-cy19vpom through kube
API","Found logs through kube API"]
```
Cloudwatch is still correctly getting the logs for the pod, but it seems the
pod is not creating the specified log stream, preventing the API Server from
being able to display the logs. In the pod logs we do see a warning from
Watchtower.
```
{"timestamp":"2025-10-23T18:59:57.889967Z","level":"warning","event":"Received
message after logging system
shutdown","category":"WatchtowerWarning","filename":"/home/airflow/.local/lib/python3.12/site-packages/watchtower/__init__.py","lineno":464,"logger":"py.warnings"}
```
As mentioned, this error was encountered with our custom 3.1 image, as well
as the community maintained 3.0.2 and 3.1 images deployed via the Official
Airflow Helm Chart.
### What you think should happen instead?
I would expect all airflow components to correctly create the log streams in
Cloudwatch so that the API Server can fetch and render the task logs.
### How to reproduce
Run a dag using the deployment details below. You should see the task logs
while the dag is running/the pod is alive. Then the API Server will stop
showing logs once the pod is killed.
### Operating System
Ubuntu 22.01
### Versions of Apache Airflow Providers
_No response_
### Deployment
Official Apache Airflow Helm Chart
### Deployment details
Below is the Helm values file we submitted to the Official Airflow Helm
Chart to reproduce the logging issues mentioned.
```yaml
config:
logging:
remote_logging: "True"
remote_base_log_folder:
"cloudwatch://arn:aws:logs:us-east-1:<account-id>:log-group:airflowtest"
remote_log_conn_id: "aws_default"
logging_level: "DEBUG"
env:
- name: "AIRFLOW_CONN_AWS_DEFAULT"
value: "aws://"
redis:
enabled: false
executor: "KubernetesExecutor"
postgresql:
enabled: false
data:
metadataConnection:
user: "postgres"
pass: "<password>"
host: "<instance-id>.us-east-1.rds.amazonaws.com"
sslmode: "require"
statsd:
enabled: false
triggerer:
persistence:
enabled: false
extraConfigMaps:
'dag-cm':
data: |
test-dag.py: |
from airflow.sdk import DAG
from airflow.providers.standard.operators.python import
PythonOperator
from datetime import datetime
with DAG(dag_id="example_dag", start_date=datetime(2020, 1, 1)) as
dag:
def print_hello():
print("Hello world!")
def print_goodbye():
print("Goodbye world!")
hello_task = PythonOperator.partial(
task_id="hello_task",
).expand(python_callable=[print_hello, print_goodbye])
hello_task
volumes:
- name: dag-cm
configMap:
name: dag-cm
- name: airflow-data
persistentVolumeClaim:
claimName: "airflow-test"
volumeMounts:
- name: dag-cm
mountPath: /opt/airflow/dags/test-dag.py
subPath: test-dag.py
- mountPath: /tenant/airflow/logs
name: airflow-data
defaultAirflowTag: "3.1.0"
airflowVersion: "3.1.0"
```
### Anything else?
I know there is a closed issue that describes exactly what we are
experiencing with people still seeming to also have this issue.
https://github.com/apache/airflow/issues/52501
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]