[ 
https://issues.apache.org/jira/browse/AIRFLOW-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16727960#comment-16727960
 ] 

jack commented on AIRFLOW-3045:
-------------------------------

Shouldn't have this break any MySQL back-end installed since 1.10.0 ?

> Duplicate entry error with MySQL when update task_instances
> -----------------------------------------------------------
>
>                 Key: AIRFLOW-3045
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3045
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: database
>    Affects Versions: 1.10.0
>            Reporter: Haotian Wu
>            Assignee: Haotian Wu
>            Priority: Major
>
> h3. How to reproduce
> # Setup apach-airflow==1.10.0 with MySQL, bring up both webserver and 
> scheduler.
> # Add a DAG and it becomes running but none of the task will be actually 
> executed.
> # Manually trigger another run for the same dag, airflow scheduler will crash 
> with error {{sqlalchemy.exc.IntegrityError: 
> (_mysql_exceptions.IntegrityError) (1062, "Duplicate entry 
> 'xxxxxxx-yyyyyy-YYYY-MM-DD ...' for key 'PRIMARY'")}}
> h3. The Reason
> In Airflow-1.10.0, execution_date field of task_instance is changed from 
> DateTime to Timestamp. However, in MySQL first Timestamp column in a table is 
> declared with {{ON UPDATE CURRENT_TIMESTAMP}} clause. Database in MySQL will 
> look like below after {{airflow initdb}}.
> | Field           | Type          | Null | Key | Default              | Extra 
> |
> | task_id         | varchar(250)  | NO   | PRI | NULL                 |       
> |
> | dag_id          | varchar(250)  | NO   | PRI | NULL                 |       
> |
> | execution_date  | timestamp(6)  | NO   | PRI | CURRENT_TIMESTAMP(6) |  on 
> update CURRENT_TIMESTAMP(6)      |
> # When a task_instance is updated from state NULL to state "scheduled", its 
> execution_date is also reset to current timestamp automatically. 
> # task_instance is linked to a given dag_run by same execution_date, so 
> changed execution_date means task_instance is no longer linked to any known 
> dag_run.
> # If there are more than one dag_run for the same dag, multiple task_instance 
> with same <task_id, dag_id> will be "unlinked" to their dag_run. Airflow 
> scheduler will try to update them to state NULL and thus try to update them 
> to the same <task_id, dag_id, execution_date> primary key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to