fshehadeh opened a new issue, #22878:
URL: https://github.com/apache/airflow/issues/22878
### Apache Airflow Provider(s)
amazon
### Versions of Apache Airflow Providers
apache-airflow-providers-amazon 3.2.0
### Apache Airflow version
2.2.5 (latest released)
### Operating System
Linux / ECS
### Deployment
Other Docker-based deployment
### Deployment details
We are running Docker on Open Shift 4
### What happened
There seems to be a bug in the code for ECS operator, during the "reattach"
flow. We are running into some instability issues that cause our Airflow
scheduler to restart. When the scheduler restarts while a task is running using
ECS, the ECS operator will try to reattach to the ECS task once the Airflow
scheduler restarts. The code works fine finding the ECS task and attaching to
it, but then when it tries to fetch the logs, it throws the following error:
`Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py",
line 1334, in _run_raw_task
self._execute_task_with_callbacks(context)
File
"/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py",
line 1460, in _execute_task_with_callbacks
result = self._execute_task(context, self.task)
File
"/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py",
line 1516, in _execute_task
result = execute_callable(context=context)
File
"/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/session.py",
line 70, in wrapper
return func(*args, session=session, **kwargs)
File
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/operators/ecs.py",
line 295, in execute
self.task_log_fetcher = self._get_task_log_fetcher()
File
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/operators/ecs.py",
line 417, in _get_task_log_fetcher
log_stream_name = f"{self.awslogs_stream_prefix}/{self.ecs_task_id}"
AttributeError: 'EcsOperator' object has no attribute 'ecs_task_id'`
At this point, the operator will fail and the task will be marked for
retries and eventually gets marked as failed, while on the ECS side, the ECS
task is running fine. The manual way to fix this would be to wait for the ECS
task to complete, then mark the task as successful and trigger downstream
tasks. This is not very practical, since the task can take a long time (in our
case the task can take hours)
### What you think should happen instead
I expect that the ECS operator should be able to reattach and pull the logs
as normal.
### How to reproduce
Configure a task that would run using the ECS operator, and make sure it
takes a very long time. Start the task, and once the logs starts flowing to
Airflow, restart the Airflow scheduler. Wait for the scheduler to restart and
check that upon retry, the task would be able to attach and fetch the logs.
### Anything else
When restarting Airflow, it tries to kill the task at hand. In our case, we
didn't give the permission to the AWS role to kill the running ECS tasks, and
therefore the ECS tasks keep running during the restart of Airflow. Others
might not have this setup, and therefore they won't run into the "reattach"
flow, and they won't encounter the issue reported here. This is not a good
option for us, since our tasks can take hours to complete, and we don't want to
interfere with their execution.
We also need to improve the stability of the Open Shift infrastructure where
Airflow is running, so that the scheduler doesn't restart so often, but that is
a different story.
### Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]