rjtshrm commented on issue #52501: URL: https://github.com/apache/airflow/issues/52501#issuecomment-3086500325
Hi @sdg9670, I was trying to debug the issue and may have found something. After performing multiple stack traces, I discovered that our task executor successfully calls the upload function, which eventually triggers self.close (used to flush the logs). This results in failure because self.handler.shutting_down is somehow set to True. You can see the relevant code here: https://github.com/apache/airflow/blob/0702a267579fd70656901e1a9dcda27a2a1f9bbe/providers/amazon/src/airflow/providers/amazon/aws/log/cloudwatch_task_handler.py#L151-L165 Airflow uses the third-party library watchtower for CloudWatch logging. From the logs, I can see the status of the handler is already in the "shutting down" state, as shown here: https://github.com/kislyuk/watchtower/blob/765ca87a043247bbe08e8bfb2cb69d71305bd367/watchtower/__init__.py#L463-L464. As you can see the logs Line 23, the message `Received message after logging system shutdown` was logged before starting actual task mentioned which is mention [here](https://github.com/kislyuk/watchtower/blob/765ca87a043247bbe08e8bfb2cb69d71305bd367/watchtower/__init__.py#L463-L464). On further tracing back (Line 8 - Line 18 in the logs), I found out that configure_logging function is responsible for clearing existing handlers (it internally calls the python logging library and which eventually clears existing handlers): https://github.com/apache/airflow/blob/fc5410c27d9c5979d46b1c0d13a364096bbc00ab/task-sdk/src/airflow/sdk/log.py#L376-L420 At this point, I have no idea why this happens, but at least we know that CloudWatchRemoteLogIO is initiated and even called to upload, but because of the handler shutting down status, it couldn't proceed to flush the logs to remote. ```python 1 /home/airflow/.local/lib/python3.11/site-packages/airflow/configuration.py:858 DeprecationWarning: The secret_key option in [webserver] has been moved to the secret_key option in [api] - the old setting has been used, but please update your config. 2 /home/airflow/.local/lib/python3.11/site-packages/airflow/configuration.py:858 DeprecationWarning: The secret_key option in [webserver] has been moved to the secret_key option in [api] - the old setting has been used, but please update your config. 3 [2025-07-18T02:14:39.663+0000] {base_aws.py:632} WARNING - Unable to find AWS Connection ID 'MyCloudwatchConn', switching to empty. 4 /home/airflow/.local/lib/python3.11/site-packages/airflow/configuration.py:858 DeprecationWarning: The secret_key option in [webserver] has been moved to the secret_key option in [api] - the old setting has been used, but please update your config. 5 [2025-07-18T02:14:39.663+0000] {base_aws.py:632} WARNING - Unable to find AWS Connection ID 'MyCloudwatchConn', switching to empty. 6 [2025-07-18T02:14:39.664+0000] {base_aws.py:201} INFO - No connection ID provided. Fallback on boto3 credential strategy (region_name='eu-central-1'). See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html 7 123 CloudWatchLogHandler(log_group_name='hc-airflow-dev-logs', log_stream_name='') False 8 Function: close, File: /home/airflow/.local/lib/python3.11/site-packages/watchtower/__init__.py, Line: 527 9 Function: shutdown, File: /usr/local/lib/python3.11/logging/__init__.py, Line: 2186 10 Function: _clearExistingHandlers, File: /usr/local/lib/python3.11/logging/config.py, Line: 285 11 Function: configure, File: /usr/local/lib/python3.11/logging/config.py, Line: 546 12 Function: dictConfig, File: /usr/local/lib/python3.11/logging/config.py, Line: 823 13 Function: configure_logging, File: /home/airflow/.local/lib/python3.11/site-packages/airflow/sdk/log.py, Line: 375 14 Function: execute_workload, File: /home/airflow/.local/lib/python3.11/site-packages/airflow/sdk/execution_time/execute_workload.py, Line: 51 15 Function: main, File: /home/airflow/.local/lib/python3.11/site-packages/airflow/sdk/execution_time/execute_workload.py, Line: 121 16 Function: <module>, File: /home/airflow/.local/lib/python3.11/site-packages/airflow/sdk/execution_time/execute_workload.py, Line: 125 17 Function: _run_code, File: <frozen runpy>, Line: 88 18 Function: _run_module_as_main, File: <frozen runpy>, Line: 198 19 533 close 20 {"timestamp":"2025-07-18T02:14:41.329205Z","level":"info","event":"Executing workload","workload":"ExecuteTask(token='xxxx', ti=TaskInstance(id=UUID('01981b46-5d7a-7e42-ad55-59f5d4ab014a'), task_id='offset_yesterday', dag_id='integration_tests', run_id='scheduled__2025-07-18T00:00:00+00:00', try_number=10, map_index=-1, pool_slots=1, queue='default', priority_weight=7, executor_config=None, parent_context_carrier={}, context_carrier={}, queued_dttm=datetime.datetime(2025, 7, 18, 2, 4, 13, 455399, tzinfo=TzInfo(UTC))), dag_rel_path=PurePosixPath('integration_tests.py'), bundle_info=BundleInfo(name='dags-folder', version=None), log_path='dag_id=integration_tests/run_id=scheduled__2025-07-18T00:00:00+00:00/task_id=offset_yesterday/attempt=10.log', type='ExecuteTask')","logger":"__main__"} 21 {"timestamp":"2025-07-18T02:14:45.037559Z","level":"info","event":"Connecting to server:","server":"http://airflow-api-server:8080/execution/","logger":"__main__"} 22 {"timestamp":"2025-07-18T02:14:45.077689Z","level":"info","event":"Secrets backends loaded for worker","count":1,"backend_classes":["EnvironmentVariablesBackend"],"logger":"supervisor"} 23 {"timestamp":"2025-07-18T02:14:45.355085Z","level":"warning","event":"Received message after logging system shutdown","category":"WatchtowerWarning","filename":"/home/airflow/.local/lib/python3.11/site-packages/watchtower/__init__.py","lineno":464,"logger":"py.warnings"} 24 {"timestamp":"2025-07-18T02:14:45.353477","level":"info","event":"DAG bundles loaded: dags-folder","logger":"airflow.dag_processing.bundles.manager.DagBundlesManager"} 25 {"timestamp":"2025-07-18T02:14:45.354114","level":"info","event":"Filling up the DagBag from /opt/airflow/dags/repo/integration_tests.py","logger":"airflow.models.dagbag.DagBag"} 26 {"timestamp":"2025-07-18T02:14:50.274070","level":"info","event":"Building pod offset-yesterday-vqi2egk5 with labels: {'dag_id': 'integration_tests', 'task_id': 'offset_yesterday', 'run_id': 'scheduled__2025-07-18T0000000000-a99370155', 'kubernetes_pod_operator': 'True', 'try_number': '10'}","logger":"airflow.task.operators.airflow.providers.cncf.kubernetes.operators.pod.KubernetesPodOperator"} 27 {"timestamp":"2025-07-18T02:14:50.290527","level":"info","event":"Connection Retrieved 'kubernetes_default'","logger":"airflow.hooks.base"} 28 {"timestamp":"2025-07-18T02:14:50.424679","level":"info","event":"Found matching pod offset-yesterday-vqi2egk5 with labels {'airflow_kpo_in_cluster': 'True', 'airflow_version': '3.0.3', 'dag_id': 'integration_tests', 'eks.amazonaws.com/fargate-profile': 'airflow', 'kubernetes_pod_operator': 'True', 'run_id': 'scheduled__2025-07-18T0000000000-a99370155', 'task_id': 'offset_yesterday', 'try_number': '10'}","logger":"airflow.task.operators.airflow.providers.cncf.kubernetes.operators.pod.KubernetesPodOperator"} 29 {"timestamp":"2025-07-18T02:14:50.425254","level":"info","event":"`try_number` of task_instance: 10","logger":"airflow.task.operators.airflow.providers.cncf.kubernetes.operators.pod.KubernetesPodOperator"} 30 {"timestamp":"2025-07-18T02:14:50.425339","level":"info","event":"`try_number` of pod: 10","logger":"airflow.task.operators.airflow.providers.cncf.kubernetes.operators.pod.KubernetesPodOperator"} 31 {"timestamp":"2025-07-18T02:14:50.440512","level":"info","event":"::group::Waiting until 360s to get the POD scheduled...","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 32 {"timestamp":"2025-07-18T02:14:55.461694","level":"info","event":"The Pod has an Event: Successfully enabled logging for pod from None","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 33 {"timestamp":"2025-07-18T02:15:40.625420","level":"info","event":"The Pod has an Event: Successfully assigned airflow/offset-yesterday-vqi2egk5 to fargate-ip-10-31-142-36.eu-central-1.compute.internal from None","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 34 {"timestamp":"2025-07-18T02:15:40.625593","level":"info","event":"The Pod has an Event: Pulling image \"386220377733.dkr.ecr.eu-central-1.amazonaws.com/datahub/kafka_offset_shell:latest\" from spec.containers{base}","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 35 {"timestamp":"2025-07-18T02:15:40.637954","level":"info","event":"Waiting 360s to get the POD running...","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 36 {"timestamp":"2025-07-18T02:15:50.665971","level":"info","event":"The Pod has an Event: Successfully pulled image \"386220377733.dkr.ecr.eu-central-1.amazonaws.com/datahub/kafka_offset_shell:latest\" in 10.91s (10.91s including waiting). Image size: 183626976 bytes. from spec.containers{base}","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 37 {"timestamp":"2025-07-18T02:15:50.666104","level":"info","event":"The Pod has an Event: Created container: base from spec.containers{base}","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 38 {"timestamp":"2025-07-18T02:15:55.685249","level":"info","event":"The Pod has an Event: Started container base from spec.containers{base}","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 39 {"timestamp":"2025-07-18T02:15:55.700832","level":"info","event":"::endgroup::","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 40 {"timestamp":"2025-07-18T02:16:00.746379","level":"info","event":"[base] Jul 18, 2025 2:15:51 AM com.holidaycheck.datahub.kafka.Boot$ <init>","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 41 {"timestamp":"2025-07-18T02:16:00.746602","level":"info","event":"[base] INFO: Test123","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 42 {"timestamp":"2025-07-18T02:16:00.746908","level":"info","event":"[base] SLF4J: No SLF4J providers were found.","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 43 {"timestamp":"2025-07-18T02:16:00.747041","level":"info","event":"[base] SLF4J: Defaulting to no-operation (NOP) logger implementation","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 44 {"timestamp":"2025-07-18T02:16:00.747156","level":"info","event":"[base] SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details.","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 45 {"timestamp":"2025-07-18T02:16:00.747269","level":"info","event":"[base] browser-firehos:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 46 {"timestamp":"2025-07-18T02:16:00.747418","level":"info","event":"[base] browser-firehos:1:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 47 {"timestamp":"2025-07-18T02:16:00.747793","level":"info","event":"[base] browser-firehose:0:27905359929:Fri Jul 18 00:00:00 UTC 2025","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 48 {"timestamp":"2025-07-18T02:16:00.747930","level":"info","event":"[base] browser-firehose:1:27747815271:Fri Jul 18 00:00:00 UTC 2025","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 49 {"timestamp":"2025-07-18T02:16:00.748165","level":"info","event":"[base] browser-firehose:2:27982007204:Fri Jul 18 00:00:00 UTC 2025","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 50 {"timestamp":"2025-07-18T02:16:00.748346","level":"info","event":"[base] businesscenter-browser-firehose:0:5522065:Fri Jul 18 00:14:27 UTC 2025","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 51 {"timestamp":"2025-07-18T02:16:00.748483","level":"info","event":"[base] businesscenter-browser-firehose:1:5519819:Fri Jul 18 00:03:34 UTC 2025","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 52 {"timestamp":"2025-07-18T02:16:00.748603","level":"info","event":"[base] businesscenter-browser-firehose:2:5293720:Fri Jul 18 00:12:21 UTC 2025","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 53 {"timestamp":"2025-07-18T02:16:00.748974","level":"info","event":"[base] consumer-uia-dev-__assignor-__leader:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 54 {"timestamp":"2025-07-18T02:16:00.749108","level":"info","event":"[base] driveboo-browser-firehose:0:19414209:Fri Jul 18 00:00:29 UTC 2025","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 55 {"timestamp":"2025-07-18T02:16:00.749228","level":"info","event":"[base] driveboo-browser-firehose:1:19414200:Fri Jul 18 00:00:29 UTC 2025","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 56 {"timestamp":"2025-07-18T02:16:00.749345","level":"info","event":"[base] driveboo-browser-firehose:2:19371004:Fri Jul 18 00:00:04 UTC 2025","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 57 {"timestamp":"2025-07-18T02:16:00.749535","level":"info","event":"[base] faust_app-__assignor-__leader:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 58 {"timestamp":"2025-07-18T02:16:00.749662","level":"info","event":"[base] faust_app2-__assignor-__leader:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 59 {"timestamp":"2025-07-18T02:16:00.749948","level":"info","event":"[base] ga_serverside_agent-__assignor-__leader:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 60 {"timestamp":"2025-07-18T02:16:00.750108","level":"info","event":"[base] gav4-__assignor-__leader:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 61 {"timestamp":"2025-07-18T02:16:00.750230","level":"info","event":"[base] gav4_dev-__assignor-__leader:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 62 {"timestamp":"2025-07-18T02:16:00.750349","level":"info","event":"[base] kmodi-test-app-__assignor-__leader:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 63 {"timestamp":"2025-07-18T02:16:00.750529","level":"info","event":"[base] kmoditestapp-__assignor-__leader:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 64 {"timestamp":"2025-07-18T02:16:00.750664","level":"info","event":"[base] mobileapp-firehose:0:1781179078:Fri Jul 18 00:00:03 UTC 2025","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 65 {"timestamp":"2025-07-18T02:16:00.750783","level":"info","event":"[base] mobileapp-firehose:1:248500882:Fri Jul 18 00:00:01 UTC 2025","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 66 {"timestamp":"2025-07-18T02:16:00.750900","level":"info","event":"[base] mobileapp-firehose:2:247401363:Fri Jul 18 00:00:05 UTC 2025","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 67 {"timestamp":"2025-07-18T02:16:00.751016","level":"info","event":"[base] mobileapp-refined:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 68 {"timestamp":"2025-07-18T02:16:00.751130","level":"info","event":"[base] p-1test-app-__assignor-__leader:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 69 {"timestamp":"2025-07-18T02:16:00.751246","level":"info","event":"[base] p-test-app-1-__assignor-__leader:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 70 {"timestamp":"2025-07-18T02:16:00.751362","level":"info","event":"[base] p-test-app-__assignor-__leader:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 71 {"timestamp":"2025-07-18T02:16:00.751477","level":"info","event":"[base] pablo-test-app-__assignor-__leader:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 72 {"timestamp":"2025-07-18T02:16:00.751657","level":"info","event":"[base] pairing-1-test-app-__assignor-__leader:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 73 {"timestamp":"2025-07-18T02:16:00.751800","level":"info","event":"[base] pairing-2-test-app-__assignor-__leader:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 74 {"timestamp":"2025-07-18T02:16:00.751918","level":"info","event":"[base] pairing-3-app-__assignor-__leader:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 75 {"timestamp":"2025-07-18T02:16:00.752146","level":"info","event":"[base] pairing-3-test-app-__assignor-__leader:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 76 {"timestamp":"2025-07-18T02:16:00.752276","level":"info","event":"[base] pairing-4-app-__assignor-__leader:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 77 {"timestamp":"2025-07-18T02:16:00.752489","level":"info","event":"[base] pairing-app-0.39487885716526927-__assignor-__leader:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 78 {"timestamp":"2025-07-18T02:16:00.752621","level":"info","event":"[base] pairing-app-0.5064171649610345-__assignor-__leader:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 79 {"timestamp":"2025-07-18T02:16:00.752920","level":"info","event":"[base] pairing-app-0.7001794979471214-__assignor-__leader:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 80 {"timestamp":"2025-07-18T02:16:00.753054","level":"info","event":"[base] pairing-app-0.7525169392136984-__assignor-__leader:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 81 {"timestamp":"2025-07-18T02:16:00.753178","level":"info","event":"[base] pairing-app-1-__assignor-__leader:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 82 {"timestamp":"2025-07-18T02:16:00.753295","level":"info","event":"[base] pairing-app-11-__assignor-__leader:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 83 {"timestamp":"2025-07-18T02:16:00.753406","level":"info","event":"[base] pairing-test-app-__assignor-__leader:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 84 {"timestamp":"2025-07-18T02:16:00.753521","level":"info","event":"[base] uia_mongo-__assignor-__leader:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 85 {"timestamp":"2025-07-18T02:16:00.753877","level":"info","event":"[base] urlauber_interactions_agent-__assignor-__leader:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 86 {"timestamp":"2025-07-18T02:16:00.754058","level":"info","event":"[base] urlauber_interactions_agent_mongo-__assignor-__leader:0:Offset for the specified timestamp not found","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 87 {"timestamp":"2025-07-18T02:16:00.754274","level":"info","event":"[base] user-action-mapper-__assignor-__leader:0:Offset for the specified timestamp not found\n","logger":"airflow.providers.cncf.kubernetes.utils.pod_manager.PodManager"} 88 {"timestamp":"2025-07-18T02:16:00.873527","level":"info","event":"Deleting pod: offset-yesterday-vqi2egk5","logger":"airflow.task.operators.airflow.providers.cncf.kubernetes.operators.pod.KubernetesPodOperator"} 89 {"timestamp":"2025-07-18T02:16:00.956315Z","level":"info","event":"Task finished","exit_code":0,"duration":75.87933083500002,"final_state":"success","logger":"supervisor"} 90 928 _upload_logs 91 550 handler: CloudWatchRemoteLogIO(base_log_folder=PosixPath('/opt/airflow/logs'), remote_base='cloudwatch://arn:aws:logs:eu-central-1:386220377733:log-group:hc-airflow-dev-logs', delete_local_copy=False, log_group_arn='arn:aws:logs:eu-central-1:386220377733:log-group:hc-airflow-dev-logs', log_stream_name='') 92 557 relative_path: dag_id=integration_tests/run_id=scheduled__2025-07-18T00:00:00+00:00/task_id=offset_yesterday/attempt=10.log 93 567 dag_id=integration_tests/run_id=scheduled__2025-07-18T00:00:00+00:00/task_id=offset_yesterday/attempt=10.log 94 157 CloudWatchLogHandler(log_group_name='hc-airflow-dev-logs', log_stream_name='dag_id=integration_tests/run_id=scheduled__2025-07-18T00_00_00+00_00/task_id=offset_yesterday/attempt=10.log') True ``` @gopidesupavan @o-nikolas @vincbeck -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
