[GitHub] [airflow] fshehadeh opened a new issue, #22878: ECS operator throws an error on attempting to reattach to ECS tasks

GitBox Sat, 09 Apr 2022 10:25:13 -0700


fshehadeh opened a new issue, #22878:
URL: https://github.com/apache/airflow/issues/22878


   ### Apache Airflow Provider(s)
   
   amazon
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon      3.2.0
   
   
   
   
   ### Apache Airflow version
   
   2.2.5 (latest released)
   
   ### Operating System
   
   Linux / ECS
   
   ### Deployment
   
   Other Docker-based deployment
   
   ### Deployment details
   
   We are running Docker on Open Shift 4
   
   ### What happened
   
   There seems to be a bug in the code for ECS operator, during the "reattach" 
flow. We are running into some instability issues that cause our Airflow 
scheduler to restart. When the scheduler restarts while a task is running using 
ECS, the ECS operator will try to reattach to the ECS task once the Airflow 
scheduler restarts. The code works fine finding the ECS task and attaching to 
it, but then when it tries to fetch the logs, it throws the following error:
   `Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py",
 line 1334, in _run_raw_task
       self._execute_task_with_callbacks(context)
     File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py",
 line 1460, in _execute_task_with_callbacks
       result = self._execute_task(context, self.task)
     File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py",
 line 1516, in _execute_task
       result = execute_callable(context=context)
     File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/session.py", 
line 70, in wrapper
       return func(*args, session=session, **kwargs)
     File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/operators/ecs.py",
 line 295, in execute
       self.task_log_fetcher = self._get_task_log_fetcher()
     File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/providers/amazon/aws/operators/ecs.py",
 line 417, in _get_task_log_fetcher
       log_stream_name = f"{self.awslogs_stream_prefix}/{self.ecs_task_id}"
   AttributeError: 'EcsOperator' object has no attribute 'ecs_task_id'`
   
   At this point, the operator will fail and the task will be marked for 
retries and eventually gets marked as failed, while on the ECS side, the ECS 
task is running fine. The manual way to fix this would be to wait for the ECS 
task to complete, then mark the task as successful and trigger downstream 
tasks. This is not very practical, since the task can take a long time (in our 
case the task can take hours)
   
   ### What you think should happen instead
   
   I expect that the ECS operator should be able to reattach and pull the logs 
as normal.
   
   ### How to reproduce
   
   Configure a task that would run using the ECS operator, and make sure it 
takes a very long time. Start the task, and once the logs starts flowing to 
Airflow, restart the Airflow scheduler. Wait for the scheduler to restart and 
check that upon retry, the task would be able to attach and fetch the logs.
   
   ### Anything else
   
   When restarting Airflow, it tries to kill the task at hand. In our case, we 
didn't give the permission to the AWS role to kill the running ECS tasks, and 
therefore the ECS tasks keep running during the restart of Airflow. Others 
might not have this setup, and therefore they won't run into the "reattach" 
flow, and they won't encounter the issue reported here. This is not a good 
option for us, since our tasks can take hours to complete, and we don't want to 
interfere with their execution.
   
   We also need to improve the stability of the Open Shift infrastructure where 
Airflow is running, so that the scheduler doesn't restart so often, but that is 
a different story.
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] fshehadeh opened a new issue, #22878: ECS operator throws an error on attempting to reattach to ECS tasks

Reply via email to