cedric-fauth opened a new issue, #63183:
URL: https://github.com/apache/airflow/issues/63183
### Apache Airflow version
Other Airflow 3 version (please specify below)
### If "Other Airflow 3 version" selected, which one?
3.1.6
### What happened?
Many of my airflow DAGs are failing because some tasks are killed and marked
as failed even if they completed successfully. This seems to happen randomly
but often multiple DAGs fail at a similar time.
### Behavior:
Just after a task logs their last log message and returns there is an
internal error with the airflow sdk involved:
```
{"logger": "airflow.task.operators.dags.custom_operators.DjangoOperator",
"filename": "python.py", "lineno": 216, "event": "Done. Returned value was:
None", "level": "info"}
{"logger": "task", "filename": "task_runner.py", "lineno": 1562,
"error_detail": [{"exc_type": "AirflowRuntimeError", "exc_value":
"API_SERVER_ERROR: {'status_code': 409, 'message': 'Server returned error',
'detail': {'detail': {'reason': 'invalid_state', 'message': 'TI was not in the
running state so it cannot be updated', 'previous_state': 'success'}}}",
"exc_notes": [], "syntax_error": null, "is_cause": false, "frames":
[{"filename":
"/bin/app/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py",
"lineno": 1555, "name": "main"}, {"filename":
"/bin/app/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py",
"lineno": 1083, "name": "run"}, {"filename":
"/bin/app/lib/python3.12/site-packages/airflow/sdk/execution_time/comms.py",
"lineno": 206, "name": "send"}, {"filename":
"/bin/app/lib/python3.12/site-packages/airflow/sdk/execution_time/comms.py",
"lineno": 270, "name": "_get_response"}, {"filename":
"/bin/app/lib/python3.12/site-packages/airfl
ow/sdk/execution_time/comms.py", "lineno": 257, "name": "_from_frame"}],
"is_group": false, "exceptions": []}], "event": "Top level error", "level":
"error"}
{"exit_code": 1, "event": "Process exited abnormally", "level": "warning",
"logger": "task"}
```
[{\"exc_notes\":[],\"exc_type\":\"AirflowRuntimeError\",\"exc_value\":\"API_SERVER_ERROR:
{'status_code': 409, 'message': 'Server returned error', 'detail': {'detail':
{'reason': 'invalid_state', 'message': 'TI was not in the running state so it
cannot be updated', 'previous_state':
'success'}}}\",\"exceptions\":[],\"frames\":[{\"filename\":\"/bin/app/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py\",\"lineno\":1555,\"name\":\"main\"},{\"filename\":\"/bin/app/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py\",\"lineno\":1083,\"name\":\"run\"},{\"filename\":\"/bin/app/lib/python3.12/site-packages/airflow/sdk/execution_time/comms.py\",\"lineno\":206,\"name\":\"send\"},{\"filename\":\"/bin/app/lib/python3.12/site-packages/airflow/sdk/execution_time/comms.py\",\"lineno\":270,\"name\":\"_get_response\"},{\"filename\":\"/bin/app/lib/python3.12/site-packages/airflow/sdk/execution_time/comms.py\",\"lineno\":257,\"name\":\"_from_frame\"}],\"is_c
ause\":false,\"is_group\":false,\"syntax_error\":\"\\u003cnil\\u003e\"}]
### What you think should happen instead?
After the task ends it should update its state to successful and updating
the state should not result in an error. It looks a bit like airflow already
updated the state and tries to update it again even if the task was already
completed.
### How to reproduce
I don't know how this can be reproduced but there are similar issues to this
one which handle different cases of `TI was not in the running state so it
cannot be updated`.
### Operating System
WSL
### Versions of Apache Airflow Providers
apache-airflow[celery, redis, amazon, postgres, docker, fab]==3.1.6
apache-airflow-providers-fab==3.1.2
### Deployment
Other
### Deployment details
Deployed on ECS using a Fargate Cluster. One task per worker, scheduler,
etc. Only one container running per task with additional logging sidecar
containers.
### Anything else?
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]