solatis opened a new issue, #34357:
URL: https://github.com/apache/airflow/issues/34357
### Apache Airflow version
Other Airflow 2 version (please specify below)
### What happened
Deadlock detected after task completion. As far as I can see, it's the same
(or similar) issue as described in ticket #19957, but it was requested a new
issue would be created after 2.6.1.
This particular task is a mapped task with 447 independent tasks.
Task logs:
```
ip-172-26-113-94.ec2.internal
*** Reading remote log from Cloudwatch log_group: [redacted] log_stream:
dag_id=[redacted]/run_id=scheduled__2023-09-14T01_55_00+00_00/task_id=split_zip_subfiles/map_index=389/attempt=1.log.
[2023-09-14, 01:58:09 UTC] {{taskinstance.py:1103}} INFO - Dependencies all
met for dep_context=non-requeueable deps ti=<TaskInstance:
[redacted].[redacted] scheduled__2023-09-14T01:55:00+00:00 map_index=389 [None]>
[2023-09-14, 01:58:09 UTC] {{taskinstance.py:1103}} INFO - Dependencies all
met for dep_context=requeueable deps ti=<TaskInstance: [redacted].[redacted]
scheduled__2023-09-14T01:55:00+00:00 map_index=389 [None]>
[2023-09-14, 01:58:09 UTC] {{taskinstance.py:1308}} INFO - Starting attempt
1 of 4
[2023-09-14, 01:58:09 UTC] {{taskinstance.py:1327}} INFO - Executing
<Mapped(_PythonDecoratedOperator): split_zip_subfiles> on 2023-09-14
01:55:00+00:00
[2023-09-14, 01:58:09 UTC] {{standard_task_runner.py:57}} INFO - Started
process 2343 to run task
[2023-09-14, 01:58:09 UTC] {{standard_task_runner.py:84}} INFO - Running:
['airflow', 'tasks', 'run', '[redacted]', '[redacted]',
'scheduled__2023-09-14T01:55:00+00:00', '--job-id', '132515', '--raw',
'--subdir', 'DAGS_FOLDER/[redacted]', '--cfg-path', '/tmp/tmpinzfkwre',
'--map-index', '389']
[2023-09-14, 01:58:09 UTC] {{standard_task_runner.py:85}} INFO - Job 132515:
Subtask [redacted]
[2023-09-14, 01:58:09 UTC] {{task_command.py:410}} INFO - Running
<TaskInstance: [redacted].[redacted] scheduled__2023-09-14T01:55:00+00:00
map_index=389 [running]> on host ip-172-26-113-94.ec2.internal
[2023-09-14, 01:58:10 UTC] {{taskinstance.py:1545}} INFO - Exporting env
vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='[redacted]'
AIRFLOW_CTX_TASK_ID='[redacted]'
AIRFLOW_CTX_EXECUTION_DATE='2023-09-14T01:55:00+00:00'
AIRFLOW_CTX_TRY_NUMBER='1'
AIRFLOW_CTX_DAG_RUN_ID='scheduled__2023-09-14T01:55:00+00:00'
[ redacted ]
[2023-09-14, 01:58:12 UTC] {{python.py:183}} INFO - Done. Returned value
was: [redacted]
[2023-09-14, 01:58:12 UTC] {{taskinstance.py:1345}} INFO - Marking task as
SUCCESS. dag_id=[redacted], task_id=[redacted], map_index=389,
execution_date=20230914T015500, start_date=20230914T015809,
end_date=20230914T015812
[2023-09-14, 01:58:21 UTC] {{standard_task_runner.py:104}} ERROR - Failed to
execute job 132515 for task split_zip_subfiles
((psycopg2.errors.DeadlockDetected) deadlock detected
DETAIL: Process 3206 waits for ExclusiveLock on tuple (62,16) of relation
20803 of database 16400; blocked by process 3172.
Process 3172 waits for ShareLock on transaction 4371918; blocked by process
2470.
Process 2470 waits for ShareLock on transaction 4371663; blocked by process
2659.
Process 2659 waits for ExclusiveLock on tuple (62,16) of relation 20803 of
database 16400; blocked by process 3206.
HINT: See server log for query details.
[SQL: UPDATE dag_run SET
last_scheduling_decision=%(last_scheduling_decision)s,
updated_at=%(updated_at)s WHERE dag_run.id = %(dag_run_id)s]
[parameters: {'last_scheduling_decision': datetime.datetime(2023, 9, 14, 1,
57, 37, 455797, tzinfo=Timezone('UTC')), 'updated_at': datetime.datetime(2023,
9, 14, 1, 57, 37, 533454, tzinfo=Timezone('UTC')), 'dag_run_id': 1861}]
(Background on this error at: https://sqlalche.me/e/14/e3q8); 2343)
[2023-09-14, 01:58:21 UTC] {{local_task_job_runner.py:225}} INFO - Task
exited with return code 1
[2023-09-14, 01:58:29 UTC] {{taskinstance.py:2653}} INFO - 0 downstream
tasks scheduled from follow-on schedule check
```
A retry, executed 5 minutes later, succeeded.
### What you think should happen instead
No deadlock.
### How to reproduce
Seeing this sporadically, about 0.5% of task runs.
### Operating System
AWS MWAA
### Versions of Apache Airflow Providers
_No response_
### Deployment
Amazon (AWS) MWAA
### Deployment details
Version 2.6.3. MWAA "Large" environment. 3 schedulers. 25 workers. 2,2
celery concurrency.
### Anything else
Happens around 0.25 - 0.5% of task runs. After retry, which is executed 5
minutes later, things pass.
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]