anneadb opened a new issue, #39026:
URL: https://github.com/apache/airflow/issues/39026

   ### Apache Airflow Provider(s)
   
   amazon
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon==8.7.1
   
   ### Apache Airflow version
   
   2.7.2
   
   ### Operating System
   
   Amazon Linux AMI
   
   ### Deployment
   
   Amazon (AWS) MWAA
   
   ### Deployment details
   
   We are deploying MWAA using CDK.
   Dags are located in S3.
   
   ### What happened
   
   We are running several Airflow tasks that start ECS containers and wait for 
their completion before starting the following tasks. I noticed that logs only 
appear upon the completion of the task.
   However, if a task runs into a TimeoutError in Airflow, no task logs are 
created at all in Cloudwatch.
   
   I have been in contact with AWS Enterprise Support but they are saying that 
the issue is located in the Operator and not MWAA.
   
   ### What you think should happen instead
   
   ECS task logs are written to CloudWatch even if the Airflow task fails.
   Even better: task logs are written continuously to CloudWatch.
   
   ### How to reproduce
   
   ```
   from datetime import datetime, timedelta
   
   from airflow import DAG
   from airflow.providers.amazon.aws.operators.ecs import EcsRunTaskOperator
   
   DEFAULT_ARGS = {
       "owner": "airflow",
       "depends_on_past": False,
       "start_date": datetime(2024, 1, 1),
   }
   
   with DAG(
       "sample",
       default_args=DEFAULT_ARGS,
       description="A test DAG to try out functionality",
       schedule=None,
   ) as dag:
       EcsRunTaskOperator(
           task_id="ecs_operator",
           dag=dag,
           execution_timeout=timedelta(minutes=2),
           retries=3,
           aws_conn_id="aws_default",
           cluster="ecs-airflow-cluster",
           task_definition="ecs-airflow-task:1",
           launch_type="FARGATE",
           overrides={"containerOverrides": [
                   {
                   "name": CONTAINER_NAME,
                   "command": ["python", "-c", "import time; for i in 
range(30): print(i); time.sleep(10)"],
               },
           ]},
           network_configuration={
               "awsvpcConfiguration": {
                   "subnets": ["subnet-xxxxx", "subnet-xxx"],
                    "securityGroups": ["sg-xxxx"]
           }
           },
           awslogs_group="ecs-airflow",
           awslogs_region="eu-central-1",
           awslogs_stream_prefix=f"ecs/{CONTAINER_NAME}",
           awslogs_fetch_interval=timedelta(seconds=5)
       )
   ```
   
   ### Anything else
   
   Logs example:
   
   ```
   ip-xxxxx.eu-central-1.compute.internal
   *** Reading remote log from Cloudwatch log_group: airflow-my_env_name-Task 
log_stream: 
dag_id=my_dag_id/run_id=scheduled__2024-04-02T05_30_00+00_00/task_id=mytaskid/attempt=1.log.
   [2024-04-03, 07:30:19 CEST] {{taskinstance.py:1159}} INFO - Dependencies all 
met for dep_context=non-requeueable deps ti=<TaskInstance: my_dag_id.mytaskid 
scheduled__2024-04-02T05:30:00+00:00 [queued]>
   [2024-04-03, 07:30:20 CEST] {{taskinstance.py:1159}} INFO - Dependencies all 
met for dep_context=requeueable deps ti=<TaskInstance: my_dag_id.mytaskid 
scheduled__2024-04-02T05:30:00+00:00 [queued]>
   [2024-04-03, 07:30:20 CEST] {{taskinstance.py:1361}} INFO - Starting attempt 
1 of 2
   [2024-04-03, 07:30:20 CEST] {{taskinstance.py:1382}} INFO - Executing 
<Task(EcsRunTaskOperator): mytaskid> on 2024-04-02 05:30:00+00:00
   [2024-04-03, 07:30:20 CEST] {{standard_task_runner.py:57}} INFO - Started 
process 24629 to run task
   [2024-04-03, 07:30:20 CEST] {{standard_task_runner.py:84}} INFO - Running: 
['airflow', 'tasks', 'run', 'my_dag_id', 'mytaskid', 
'scheduled__2024-04-02T05:30:00+00:00', '--job-id', '3107', '--raw', 
'--subdir', 'DAGS_FOLDER/my_dag_file.py', '--cfg-path', '/tmp/tmpw6wsfwwu']
   [2024-04-03, 07:30:20 CEST] {{standard_task_runner.py:85}} INFO - Job 3107: 
Subtask mytaskid
   [2024-04-03, 07:30:20 CEST] {{task_command.py:416}} INFO - Running 
<TaskInstance: my_dag_id.mytaskid scheduled__2024-04-02T05:30:00+00:00 
[running]> on host ip-10-123-136-202.eu-central-1.compute.internal
   [2024-04-03, 07:30:21 CEST] {{taskinstance.py:1662}} INFO - Exporting env 
vars: AIRFLOW_CTX_DAG_OWNER='di' AIRFLOW_CTX_DAG_ID='my_dag_id' 
AIRFLOW_CTX_TASK_ID='mytaskid' 
AIRFLOW_CTX_EXECUTION_DATE='2024-04-02T05:30:00+00:00' 
AIRFLOW_CTX_TRY_NUMBER='1' 
AIRFLOW_CTX_DAG_RUN_ID='scheduled__2024-04-02T05:30:00+00:00'
   [2024-04-03, 07:30:21 CEST] {{ecs.py:531}} INFO - Running ECS Task - Task 
definition: my_task_definition_name - on cluster my-cluster-name
   [2024-04-03, 07:30:21 CEST] {{ecs.py:534}} INFO - EcsOperator overrides: 
{'containerOverrides': []}
   [2024-04-03, 07:30:22 CEST] {{base.py:73}} INFO - Using connection ID 
'my_aws_account_connection_id' for task execution.
   [2024-04-03, 07:30:24 CEST] {{ecs.py:644}} INFO - ECS Task started: 
{'tasks': [{'attachments': [{'id': 'eab65079-d61e-4c16-9984-cec51e01cae4', 
'type': 'ElasticNetworkInterface', 'status': 'PRECREATED', 'details': [{'name': 
'subnetId', 'value': 'subnet-12345'}]}], 'attributes': [{'name': 
'ecs.cpu-architecture', 'value': 'x86_64'}], 'availabilityZone': 
'eu-central-1b', 'clusterArn': 
'arn:aws:ecs:eu-central-1:123456789:cluster/my-cluster-name', 'containers': 
[{'containerArn': 
'arn:aws:ecs:eu-central-1:123456789:container/my-cluster-name/d6907e6e16c54ff0ba6bc477ef4f5c6e/10c5ac9b-9623-418b-afd7-10d2d0b762f6',
 'taskArn': 
'arn:aws:ecs:eu-central-1:123456789:task/my-cluster-name/d6907e6e16c54ff0ba6bc477ef4f5c6e',
 'name': 'my_task_definition_name', 'image': 
'123456789.dkr.ecr.eu-central-1.amazonaws.com/my_task_definition_name:0.0.2', 
'lastStatus': 'PENDING', 'networkInterfaces': [], 'cpu': '0'}], 'cpu': '512', 
'createdAt': datetime.datetime(2024, 4, 3, 5, 30, 24, 868000, tzinfo=tzlocal()),
  'desiredStatus': 'RUNNING', 'enableExecuteCommand': False, 'group': 
'family:my_task_definition_name', 'lastStatus': 'PROVISIONING', 'launchType': 
'FARGATE', 'memory': '1024', 'overrides': {'containerOverrides': [{'name': 
'my_task_definition_name'}], 'inferenceAcceleratorOverrides': []}, 
'platformVersion': '1.4.0', 'platformFamily': 'Linux', 'startedBy': 'di', 
'tags': [], 'taskArn': 
'arn:aws:ecs:eu-central-1:123456789:task/my-cluster-name/d6907e6e16c54ff0ba6bc477ef4f5c6e',
 'taskDefinitionArn': 
'arn:aws:ecs:eu-central-1:123456789:task-definition/my_task_definition_name:2', 
'version': 1, 'ephemeralStorage': {'sizeInGiB': 20}}], 'failures': [], 
'ResponseMetadata': {'RequestId': '02377efa-1c03-443e-aad8-c6d0d6157f06', 
'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 
'02377efa-1c03-443e-aad8-c6d0d6157f06', 'content-type': 
'application/x-amz-json-1.1', 'content-length': '1692', 'date': 'Wed, 03 Apr 
2024 05:30:24 GMT'}, 'RetryAttempts': 0}}
   [2024-04-03, 07:30:24 CEST] {{ecs.py:647}} INFO - ECS task ID is: 
d6907e6e16c54ff0ba6bc477ef4f5c6e
   [2024-04-03, 07:30:25 CEST] {{ecs.py:573}} INFO - Starting ECS Task Log 
Fetcher
   [2024-04-03, 07:30:55 CEST] {{base.py:73}} INFO - Using connection ID 
'my_aws_account_connection_id' for task execution.
   [2024-04-03, 07:50:21 CEST] {{timeout.py:68}} ERROR - Process timed out, 
PID: 24629
   [2024-04-03, 07:50:21 CEST] {{ecs.py:757}} INFO - {'task': {'attachments': 
[{'id': 'eab65079-d61e-4c16-9984-cec51e01cae4', 'type': 
'ElasticNetworkInterface', 'status': 'ATTACHED', 'details': [{'name': 
'subnetId', 'value': 'subnet-12345'}, {'name': 'networkInterfaceId', 'value': 
'eni-08793a79328aa8a69'}, {'name': 'macAddress', 'value': '06:08:01:5a:69:fb'}, 
{'name': 'privateDnsName', 'value': 
'ip-10-123-41-14.eu-central-1.compute.internal'}, {'name': 
'privateIPv4Address', 'value': 'xxxxx'}]}], 'attributes': [{'name': 
'ecs.cpu-architecture', 'value': 'x86_64'}], 'availabilityZone': 
'eu-central-1b', 'clusterArn': 
'arn:aws:ecs:eu-central-1:123456789:cluster/my-cluster-name', 'connectivity': 
'CONNECTED', 'connectivityAt': datetime.datetime(2024, 4, 3, 5, 30, 28, 546000, 
tzinfo=tzlocal()), 'containers': [{'containerArn': 
'arn:aws:ecs:eu-central-1:123456789:container/my-cluster-name/d6907e6e16c54ff0ba6bc477ef4f5c6e/10c5ac9b-9623-418b-afd7-10d2d0b762f6',
 'taskArn': 'arn:aws:ecs:eu-central
 -1:123456789:task/my-cluster-name/d6907e6e16c54ff0ba6bc477ef4f5c6e', 'name': 
'my_task_definition_name', 'image': 
'123456789.dkr.ecr.eu-central-1.amazonaws.com/my_task_definition_name:0.0.2', 
'imageDigest': 
'sha256:9d8af306540177ee5faef1dea1f2c72c3e16a02aedcb0c85fdd42cf96f8eb7d9', 
'runtimeId': 'd6907e6e16c54ff0ba6bc477ef4f5c6e-873919162', 'lastStatus': 
'RUNNING', 'networkBindings': [], 'networkInterfaces': [{'attachmentId': 
'eab65079-d61e-4c16-9984-cec51e01cae4', 'privateIpv4Address': 'xxxxx'}], 'cpu': 
'0'}], 'cpu': '512', 'createdAt': datetime.datetime(2024, 4, 3, 5, 30, 24, 
868000, tzinfo=tzlocal()), 'desiredStatus': 'STOPPED', 'enableExecuteCommand': 
False, 'group': 'family:my_task_definition_name', 'lastStatus': 'RUNNING', 
'launchType': 'FARGATE', 'memory': '1024', 'overrides': {'containerOverrides': 
[{'name': 'my_task_definition_name'}], 'inferenceAcceleratorOverrides': []}, 
'platformVersion': '1.4.0', 'platformFamily': 'Linux', 'pullStartedAt': 
datetime.datetime(2024, 4, 3, 5, 
 30, 37, 967000, tzinfo=tzlocal()), 'pullStoppedAt': datetime.datetime(2024, 4, 
3, 5, 31, 9, 202000, tzinfo=tzlocal()), 'startedAt': datetime.datetime(2024, 4, 
3, 5, 31, 9, 595000, tzinfo=tzlocal()), 'startedBy': 'di', 'stopCode': 
'UserInitiated', 'stoppedReason': 'Task killed by the user', 'stoppingAt': 
datetime.datetime(2024, 4, 3, 5, 50, 21, 287000, tzinfo=tzlocal()), 'tags': [], 
'taskArn': 
'arn:aws:ecs:eu-central-1:123456789:task/my-cluster-name/d6907e6e16c54ff0ba6bc477ef4f5c6e',
 'taskDefinitionArn': 
'arn:aws:ecs:eu-central-1:123456789:task-definition/my_task_definition_name:2', 
'version': 4, 'ephemeralStorage': {'sizeInGiB': 20}}, 'ResponseMetadata': 
{'RequestId': '67522c9e-82c3-45e9-b0b1-2c71a9ce1fc0', 'HTTPStatusCode': 200, 
'HTTPHeaders': {'x-amzn-requestid': '67522c9e-82c3-45e9-b0b1-2c71a9ce1fc0', 
'content-type': 'application/x-amz-json-1.1', 'content-length': '2470', 'date': 
'Wed, 03 Apr 2024 05:50:20 GMT'}, 'RetryAttempts': 0}}
   [2024-04-03, 07:50:21 CEST] {{taskinstance.py:1937}} ERROR - Task failed 
with exception
   Traceback (most recent call last):
     File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/providers/amazon/aws/operators/ecs.py",
 line 578, in execute
       self._wait_for_task_ended()
     File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/providers/amazon/aws/operators/ecs.py",
 line 669, in _wait_for_task_ended
       waiter.wait(
     File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/botocore/waiter.py", 
line 55, in wait
       Waiter.wait(self, **kwargs)
     File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/botocore/waiter.py", 
line 393, in wait
       time.sleep(sleep_amount)
     File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/utils/timeout.py",
 line 69, in handle_timeout
       raise AirflowTaskTimeout(self.error_message)
   airflow.exceptions.AirflowTaskTimeout: Timeout, PID: 24629
   [2024-04-03, 07:50:21 CEST] {{taskinstance.py:1400}} INFO - Marking task as 
UP_FOR_RETRY. dag_id=my_dag_id, task_id=mytaskid, 
execution_date=20240402T053000, start_date=20240403T053019, 
end_date=20240403T055021
   [2024-04-03, 07:50:21 CEST] {{standard_task_runner.py:104}} ERROR - Failed 
to execute job 3107 for task mytaskid (Timeout, PID: 24629; 24629)
   [2024-04-03, 07:50:21 CEST] {{local_task_job_runner.py:228}} INFO - Task 
exited with return code 1
   [2024-04-03, 07:50:21 CEST] {{taskinstance.py:2778}} INFO - 0 downstream 
tasks scheduled from follow-on schedule check
   ```
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to