[
https://issues.apache.org/jira/browse/AIRFLOW-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17005764#comment-17005764
]
t oo commented on AIRFLOW-4464:
-------------------------------
ran into this.....
https://github.com/apache/airflow/blob/1.10.6/airflow/models/dagrun.py#L392-L399
is where error was raised IntegrityError:
(MySQLdb._exceptions.IntegrityError) (1062, "Duplicate entry
File
"/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/jobs/scheduler_job.py",
line 157, in _run_file_processor
pickle_dags)
File
"/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/utils/db.py",
line 74, in wrapper
return func(*args, **kwargs)
File
"/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/jobs/scheduler_job.py",
line 1591, in process_file
self._process_dags(dagbag, dags, ti_keys_to_schedule)
File
"/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/jobs/scheduler_job.py",
line 1276, in _process_dags
self._process_task_instances(dag, tis_out)
File
"/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/utils/db.py",
line 74, in wrapper
return func(*args, **kwargs)
File
"/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/jobs/scheduler_job.py",
line 761, in _process_task_instances
run.verify_integrity(session=session)
File
"/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/utils/db.py",
line 70, in wrapper
return func(*args, **kwargs)
File
"/home/ec2-user/venv/local/lib/python2.7/site-packages/airflow/models/dagrun.py",
line 399, in verify_integrity
session.commit()
> Fix case-insensitive id columns in mysql
> ----------------------------------------
>
> Key: AIRFLOW-4464
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4464
> Project: Apache Airflow
> Issue Type: Improvement
> Components: database
> Reporter: Joshua Carp
> Assignee: Joshua Carp
> Priority: Minor
> Labels: mysql
>
> By default, string comparisons in mysql are case-insensitive, so the task ids
> "foo" and "FOO" are treated as identical. This means that a dag with those
> task ids will fail to schedule with a sqlalchemy `IntegrityError` using
> mysql, but not postgres or sqlite. This situation probably doesn't happen
> often, and users probably shouldn't use task ids that are identical except
> for case, but I think we should improve the behavior here. A few options:
>
> * Configure sqlalchemy to use a binary collation for string id columns under
> mysql so that string comparisons are case-sensitive.
> * Require dag and task ids to be unique regardless of case. This would be a
> breaking change.
> * Document that mysql users should configure mysql to use binary collations
> for string types by default. This would still show users a 500 if the
> database isn't configured correctly.
>
> I'll submit a pull request with a failing unit test to describe the issue.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)