mjsqu opened a new issue, #33711:
URL: https://github.com/apache/airflow/issues/33711
### Apache Airflow version
Other Airflow 2 version (please specify below)
### What happened
Running on MWAA v2.5.1 with EcsRunTaskOperator upgraded to v8.3.0
All `EcsRunTaskOperator` tasks appear to 'detach' from the underlying ECS
Task after 10 minutes. Running a command:
```
sleep 800
```
results in:
```
[2023-08-25, 10:15:12 NZST] {{ecs.py:533}} INFO - EcsOperator overrides:
{'containerOverrides': [{'name': 'meltano', 'command': ['sleep', '800']}]}
...
[2023-08-25, 10:15:13 NZST] {{ecs.py:651}} INFO - ECS task ID is:
b2681954f66148e8909d5e74c4b94c1a
[2023-08-25, 10:15:13 NZST] {{ecs.py:565}} INFO - Starting ECS Task Log
Fetcher
[2023-08-25, 10:15:43 NZST] {{base_aws.py:554}} WARNING - Unable to find AWS
Connection ID 'aws_ecs', switching to empty.
[2023-08-25, 10:15:43 NZST] {{base_aws.py:160}} INFO - No connection ID
provided. Fallback on boto3 credential strategy (region_name='ap-southeast-2').
See:
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
[2023-08-25, 10:25:13 NZST] {{taskinstance.py:1768}} ERROR - Task failed
with exception
Traceback (most recent call last):
File
"/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/utils/session.py",
line 75, in wrapper
return func(*args, session=session, **kwargs)
File
"/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/providers/amazon/aws/operators/ecs.py",
line 570, in execute
self._wait_for_task_ended()
File
"/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/providers/amazon/aws/operators/ecs.py",
line 684, in _wait_for_task_ended
waiter.wait(
File
"/usr/local/airflow/.local/lib/python3.10/site-packages/botocore/waiter.py",
line 55, in wait
Waiter.wait(self, **kwargs)
File
"/usr/local/airflow/.local/lib/python3.10/site-packages/botocore/waiter.py",
line 388, in wait
raise WaiterError(
botocore.exceptions.WaiterError: Waiter TasksStopped failed: Max attempts
exceeded
```
It appears to be caused by the addition of `waiter.wait` with different
max_attempts (defaults to 100 instead of sys.maxsize - usually a very large
number):
```
waiter.config.max_attempts = sys.maxsize # timeout is managed by
airflow
waiter.wait(
cluster=self.cluster,
tasks=[self.arn],
WaiterConfig={
"Delay": self.waiter_delay,
"MaxAttempts": self.waiter_max_attempts,
},
)
```
### What you think should happen instead
Set the default `waiter_max_attempts` in `EcsRunTaskOperator` to
`sys.maxsize` to revert back to previous behaviour
### How to reproduce
1. You would need to set up ECS with a task definition, cluster, etc.
2. Assuming ECS is all setup - build a DAG with a EcsRunTaskOperator task
3. Run a task that should take more than 10 minutes, e.g. in `overrides` set
`command` to `["sleep","800"]`
4. The Airflow task should fail while the ECS task should run for 800
seconds and complete successfully
### Operating System
MWAA v2.5.1 Python 3.10 (Linux)
### Versions of Apache Airflow Providers
apache-airflow-providers-amazon==8.3.0
### Deployment
Amazon (AWS) MWAA
### Deployment details
n/a
### Anything else
n/a
### Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]