[
https://issues.apache.org/jira/browse/AIRFLOW-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16599093#comment-16599093
]
Trevor Edwards commented on AIRFLOW-2511:
-----------------------------------------
[https://github.com/apache/incubator-airflow/blob/f18e2550543e455c9701af0995bc393ee6a97b47/airflow/jobs.py#L2254]
This commit seems to be the most likely culprit, as it looks like the only
code that would generate the Original exception from the trace provided. SubDag
operator creates a BackfillJob for its execution.
> Subdag failed by scheduler deadlock
> -----------------------------------
>
> Key: AIRFLOW-2511
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2511
> Project: Apache Airflow
> Issue Type: Bug
> Affects Versions: 1.9.0
> Reporter: Yohei Onishi
> Priority: Major
>
> I am using subdag and sometimes main dag marked failed because of the
> following error. In this case, tasks in the subdag stopped.
> {code:java}
> hourly_dag = DAG(
> hourly_dag_name,
> default_args=dag_default_args,
> params=dag_custom_params,
> schedule_interval=config_values.hourly_job_interval,
> max_active_runs=2)
> hourly_subdag = SubDagOperator(
> task_id='s3_to_hive',
> subdag=LoadFromS3ToHive(
> hourly_dag,
> 's3_to_hive'),
> dag=hourly_dag)
> {code}
> I got this error in main dag. bug in scheduler?
> {code:java}
> [2018-05-22 21:52:19,683] {models.py:1595} ERROR - This Session's transaction
> has been rolled back due to a previous exception during flush. To begin a new
> transaction with this Session, first issue Session.rollback(). Original
> exception was: (_mysql_exceptions.OperationalError) (1213, 'Deadlock found
> when trying to get lock; try restarting transaction') [SQL: 'UPDATE
> task_instance SET state=%s WHERE task_instance.task_id = %s AND
> task_instance.dag_id = %s AND task_instance.execution_date = %s']
> [parameters: ('queued', 'transfer_from_tmp_table_into_cleaned_table',
> 'rfid_warehouse_carton_wh_g_dl_dwh_csv_uqjp_1h.s3_to_hive',
> datetime.datetime(2018, 5, 7, 5, 2))] (Background on this error at:
> http://sqlalche.me/e/e3q8)
> Traceback (most recent call last):
> sqlalchemy.exc.InvalidRequestError: This Session's transaction has been
> rolled back due to a previous exception during flush. To begin a new
> transaction with this Session, first issue Session.rollback(). Original
> exception was: (_mysql_exceptions.OperationalError) (1213, 'Deadlock found
> when trying to get lock; try restarting transaction') [SQL: 'UPDATE
> task_instance SET state=%s WHERE task_instance.task_id = %s AND
> task_instance.dag_id = %s AND task_instance.execution_date = %s']
> [parameters: ('queued', 'transfer_from_tmp_table_into_cleaned_table',
> 'rfid_warehouse_carton_wh_g_dl_dwh_csv_uqjp_1h.s3_to_hive',
> datetime.datetime(2018, 5, 7, 5, 2))] (Background on this error at:
> http://sqlalche.me/e/e3q8)
> [2018-05-22 21:52:19,687] {models.py:1624} INFO - Marking task as FAILED.
> [2018-05-22 21:52:19,688] {base_task_runner.py:98} INFO - Subtask:
> [2018-05-22 21:52:19,688] {slack_hook.py:143} INFO - Message is prepared:
> [2018-05-22 21:52:19,688] {base_task_runner.py:98} INFO - Subtask:
> {"attachments": [{"color": "danger", "text": "", "fields": [{"title": "DAG",
> "value":
> "<https://data-platform-airflow-tokyo.data.fastretailing.com:8080/admin/airflow/tree?dag_id=rfid_warehouse_carton_wh_g_dl_dwh_csv_uqjp_1h|rfid_warehouse_carton_wh_g_dl_dwh_csv_uqjp_1h>",
> "short": true}, {"title": "Owner", "value": "airflow", "short": true},
> {"title": "Task", "value": "s3_to_hive", "short": false}, {"title": "Status",
> "value": "FAILED", "short": false}, {"title": "Execution Time", "value":
> "2018-05-07T05:02:00", "short": true}, {"title": "Duration", "value":
> "826.305929", "short": true}, {"value":
> "<https://data-platform-airflow-tokyo.data.fastretailing.com:8080/admin/airflow/log?dag_id=rfid_warehouse_carton_wh_g_dl_dwh_csv_uqjp_1h&task_id=s3_to_hive&execution_date=2018-05-07T05:02:00|View
> Task Log>", "short": false}]}]}
> [2018-05-22 21:52:19,688] {models.py:1638} ERROR - Failed at executing
> callback
> [2018-05-22 21:52:19,688] {models.py:1639} ERROR - This Session's transaction
> has been rolled back due to a previous exception during flush. To begin a new
> transaction with this Session, first issue Session.rollback(). Original
> exception was: (_mysql_exceptions.OperationalError) (1213, 'Deadlock found
> when trying to get lock; try restarting transaction') [SQL: 'UPDATE
> task_instance SET state=%s WHERE task_instance.task_id = %s AND
> task_instance.dag_id = %s AND task_instance.execution_date = %s']
> [parameters: ('queued', 'transfer_from_tmp_table_into_cleaned_table',
> 'rfid_warehouse_carton_wh_g_dl_dwh_csv_uqjp_1h.s3_to_hive',
> datetime.datetime(2018, 5, 7, 5, 2))] (Background on this error at:
> http://sqlalche.me/e/e3q8)
> {code}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)