[ 
https://issues.apache.org/jira/browse/AIRFLOW-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16599024#comment-16599024
 ] 

Trevor Edwards commented on AIRFLOW-2511:
-----------------------------------------

Doing some digging into this issue, it seems like this kind of error should 
only happen if sessions are being misused: 
http://docs.sqlalchemy.org/en/latest/faq/sessions.html#this-session-s-transaction-has-been-rolled-back-due-to-a-previous-exception-during-flush-or-similar

 

Reviewing 1.10 (though 1.9.0 is similar) 
https://github.com/apache/incubator-airflow/blob/1.10.0/airflow/models.py#L1602-L1673,
 it seems like there are several calls to methods which call session.commit() 
but do not catch exceptions and rollback. The broad except at the end of the 
highlighted region in my link also attempts to do a session.commit() in the 
method call, meaning if we swallow a sql exception in order to get to that 
line, we'll get this exception. I'm not convinced this is what caused this 
specific issue, but it is at least related and probably an issue as well. I'll 
keep digging.

> Subdag failed by scheduler deadlock
> -----------------------------------
>
>                 Key: AIRFLOW-2511
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2511
>             Project: Apache Airflow
>          Issue Type: Bug
>    Affects Versions: 1.9.0
>            Reporter: Yohei Onishi
>            Priority: Major
>
> I am using subdag and sometimes main dag marked failed because of the 
> following error. In this case, tasks in the subdag stopped.
> {code:java}
> hourly_dag = DAG(
>   hourly_dag_name,
>   default_args=dag_default_args,
>   params=dag_custom_params,
>   schedule_interval=config_values.hourly_job_interval,
>   max_active_runs=2)
> hourly_subdag = SubDagOperator(
>   task_id='s3_to_hive',
>   subdag=LoadFromS3ToHive(
>   hourly_dag,
>   's3_to_hive'),
>   dag=hourly_dag)
> {code}
> I got this error in main dag. bug in scheduler?
> {code:java}
> [2018-05-22 21:52:19,683] {models.py:1595} ERROR - This Session's transaction 
> has been rolled back due to a previous exception during flush. To begin a new 
> transaction with this Session, first issue Session.rollback(). Original 
> exception was: (_mysql_exceptions.OperationalError) (1213, 'Deadlock found 
> when trying to get lock; try restarting transaction') [SQL: 'UPDATE 
> task_instance SET state=%s WHERE task_instance.task_id = %s AND 
> task_instance.dag_id = %s AND task_instance.execution_date = %s'] 
> [parameters: ('queued', 'transfer_from_tmp_table_into_cleaned_table', 
> 'rfid_warehouse_carton_wh_g_dl_dwh_csv_uqjp_1h.s3_to_hive', 
> datetime.datetime(2018, 5, 7, 5, 2))] (Background on this error at: 
> http://sqlalche.me/e/e3q8)
> Traceback (most recent call last):
> sqlalchemy.exc.InvalidRequestError: This Session's transaction has been 
> rolled back due to a previous exception during flush. To begin a new 
> transaction with this Session, first issue Session.rollback(). Original 
> exception was: (_mysql_exceptions.OperationalError) (1213, 'Deadlock found 
> when trying to get lock; try restarting transaction') [SQL: 'UPDATE 
> task_instance SET state=%s WHERE task_instance.task_id = %s AND 
> task_instance.dag_id = %s AND task_instance.execution_date = %s'] 
> [parameters: ('queued', 'transfer_from_tmp_table_into_cleaned_table', 
> 'rfid_warehouse_carton_wh_g_dl_dwh_csv_uqjp_1h.s3_to_hive', 
> datetime.datetime(2018, 5, 7, 5, 2))] (Background on this error at: 
> http://sqlalche.me/e/e3q8)
> [2018-05-22 21:52:19,687] {models.py:1624} INFO - Marking task as FAILED.
> [2018-05-22 21:52:19,688] {base_task_runner.py:98} INFO - Subtask: 
> [2018-05-22 21:52:19,688] {slack_hook.py:143} INFO - Message is prepared: 
> [2018-05-22 21:52:19,688] {base_task_runner.py:98} INFO - Subtask: 
> {"attachments": [{"color": "danger", "text": "", "fields": [{"title": "DAG", 
> "value": 
> "<https://data-platform-airflow-tokyo.data.fastretailing.com:8080/admin/airflow/tree?dag_id=rfid_warehouse_carton_wh_g_dl_dwh_csv_uqjp_1h|rfid_warehouse_carton_wh_g_dl_dwh_csv_uqjp_1h>",
>  "short": true}, {"title": "Owner", "value": "airflow", "short": true}, 
> {"title": "Task", "value": "s3_to_hive", "short": false}, {"title": "Status", 
> "value": "FAILED", "short": false}, {"title": "Execution Time", "value": 
> "2018-05-07T05:02:00", "short": true}, {"title": "Duration", "value": 
> "826.305929", "short": true}, {"value": 
> "<https://data-platform-airflow-tokyo.data.fastretailing.com:8080/admin/airflow/log?dag_id=rfid_warehouse_carton_wh_g_dl_dwh_csv_uqjp_1h&task_id=s3_to_hive&execution_date=2018-05-07T05:02:00|View
>  Task Log>", "short": false}]}]}
> [2018-05-22 21:52:19,688] {models.py:1638} ERROR - Failed at executing 
> callback
> [2018-05-22 21:52:19,688] {models.py:1639} ERROR - This Session's transaction 
> has been rolled back due to a previous exception during flush. To begin a new 
> transaction with this Session, first issue Session.rollback(). Original 
> exception was: (_mysql_exceptions.OperationalError) (1213, 'Deadlock found 
> when trying to get lock; try restarting transaction') [SQL: 'UPDATE 
> task_instance SET state=%s WHERE task_instance.task_id = %s AND 
> task_instance.dag_id = %s AND task_instance.execution_date = %s'] 
> [parameters: ('queued', 'transfer_from_tmp_table_into_cleaned_table', 
> 'rfid_warehouse_carton_wh_g_dl_dwh_csv_uqjp_1h.s3_to_hive', 
> datetime.datetime(2018, 5, 7, 5, 2))] (Background on this error at: 
> http://sqlalche.me/e/e3q8)
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to