solatis opened a new issue, #34357:
URL: https://github.com/apache/airflow/issues/34357

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### What happened
   
   Deadlock detected after task completion. As far as I can see, it's the same 
(or similar) issue as described in ticket #19957, but it was requested a new 
issue would be created after 2.6.1.
   
   This particular task is a mapped task with 447 independent tasks.
   
   Task logs:
   
   ```
   ip-172-26-113-94.ec2.internal
   *** Reading remote log from Cloudwatch log_group: [redacted] log_stream: 
dag_id=[redacted]/run_id=scheduled__2023-09-14T01_55_00+00_00/task_id=split_zip_subfiles/map_index=389/attempt=1.log.
   [2023-09-14, 01:58:09 UTC] {{taskinstance.py:1103}} INFO - Dependencies all 
met for dep_context=non-requeueable deps ti=<TaskInstance: 
[redacted].[redacted] scheduled__2023-09-14T01:55:00+00:00 map_index=389 [None]>
   [2023-09-14, 01:58:09 UTC] {{taskinstance.py:1103}} INFO - Dependencies all 
met for dep_context=requeueable deps ti=<TaskInstance: [redacted].[redacted] 
scheduled__2023-09-14T01:55:00+00:00 map_index=389 [None]>
   [2023-09-14, 01:58:09 UTC] {{taskinstance.py:1308}} INFO - Starting attempt 
1 of 4
   [2023-09-14, 01:58:09 UTC] {{taskinstance.py:1327}} INFO - Executing 
<Mapped(_PythonDecoratedOperator): split_zip_subfiles> on 2023-09-14 
01:55:00+00:00
   [2023-09-14, 01:58:09 UTC] {{standard_task_runner.py:57}} INFO - Started 
process 2343 to run task
   [2023-09-14, 01:58:09 UTC] {{standard_task_runner.py:84}} INFO - Running: 
['airflow', 'tasks', 'run', '[redacted]', '[redacted]', 
'scheduled__2023-09-14T01:55:00+00:00', '--job-id', '132515', '--raw', 
'--subdir', 'DAGS_FOLDER/[redacted]', '--cfg-path', '/tmp/tmpinzfkwre', 
'--map-index', '389']
   [2023-09-14, 01:58:09 UTC] {{standard_task_runner.py:85}} INFO - Job 132515: 
Subtask [redacted]
   [2023-09-14, 01:58:09 UTC] {{task_command.py:410}} INFO - Running 
<TaskInstance: [redacted].[redacted] scheduled__2023-09-14T01:55:00+00:00 
map_index=389 [running]> on host ip-172-26-113-94.ec2.internal
   [2023-09-14, 01:58:10 UTC] {{taskinstance.py:1545}} INFO - Exporting env 
vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='[redacted]' 
AIRFLOW_CTX_TASK_ID='[redacted]' 
AIRFLOW_CTX_EXECUTION_DATE='2023-09-14T01:55:00+00:00' 
AIRFLOW_CTX_TRY_NUMBER='1' 
AIRFLOW_CTX_DAG_RUN_ID='scheduled__2023-09-14T01:55:00+00:00'
   
   [ redacted ]
   
   [2023-09-14, 01:58:12 UTC] {{python.py:183}} INFO - Done. Returned value 
was: [redacted]
   [2023-09-14, 01:58:12 UTC] {{taskinstance.py:1345}} INFO - Marking task as 
SUCCESS. dag_id=[redacted], task_id=[redacted], map_index=389, 
execution_date=20230914T015500, start_date=20230914T015809, 
end_date=20230914T015812
   [2023-09-14, 01:58:21 UTC] {{standard_task_runner.py:104}} ERROR - Failed to 
execute job 132515 for task split_zip_subfiles 
((psycopg2.errors.DeadlockDetected) deadlock detected
   DETAIL:  Process 3206 waits for ExclusiveLock on tuple (62,16) of relation 
20803 of database 16400; blocked by process 3172.
   Process 3172 waits for ShareLock on transaction 4371918; blocked by process 
2470.
   Process 2470 waits for ShareLock on transaction 4371663; blocked by process 
2659.
   Process 2659 waits for ExclusiveLock on tuple (62,16) of relation 20803 of 
database 16400; blocked by process 3206.
   HINT:  See server log for query details.
   
   [SQL: UPDATE dag_run SET 
last_scheduling_decision=%(last_scheduling_decision)s, 
updated_at=%(updated_at)s WHERE dag_run.id = %(dag_run_id)s]
   [parameters: {'last_scheduling_decision': datetime.datetime(2023, 9, 14, 1, 
57, 37, 455797, tzinfo=Timezone('UTC')), 'updated_at': datetime.datetime(2023, 
9, 14, 1, 57, 37, 533454, tzinfo=Timezone('UTC')), 'dag_run_id': 1861}]
   (Background on this error at: https://sqlalche.me/e/14/e3q8); 2343)
   [2023-09-14, 01:58:21 UTC] {{local_task_job_runner.py:225}} INFO - Task 
exited with return code 1
   [2023-09-14, 01:58:29 UTC] {{taskinstance.py:2653}} INFO - 0 downstream 
tasks scheduled from follow-on schedule check
   ```
   
   A retry, executed 5 minutes later, succeeded.
   
   ### What you think should happen instead
   
   No deadlock.
   
   ### How to reproduce
   
   Seeing this sporadically, about 0.5% of task runs.
   
   ### Operating System
   
   AWS MWAA
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Amazon (AWS) MWAA
   
   ### Deployment details
   
   Version 2.6.3. MWAA "Large" environment. 3 schedulers. 25 workers. 2,2 
celery concurrency.
   
   
   ### Anything else
   
   Happens around 0.25 - 0.5% of task runs. After retry, which is executed 5 
minutes later, things pass.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to