[
https://issues.apache.org/jira/browse/AIRFLOW-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756959#comment-16756959
]
Andy Huynh commented on AIRFLOW-3786:
-------------------------------------
Hi [~thayne2], assuming you're using utf8mb4 encoding, correct? Seems like the
error stems from the fact that in utf8mb4 encoding, each character represents
four bytes. The dag_id column (the primary key) is set to varchar(250),
producing a key length of 1000 bytes which exceeds the 767 maximum. Take a look
at the top answer here:
_If you're using utf8mb4, and you have unique indexes on varchar columns that
are greater than 191 characters in length, you'll need to turn on
innodb_large_prefix to allow for larger columns in indexes, because utf8mb4
requires more storage space than utf8 or latin1._
([https://stackoverflow.com/questions/6172798/mysql-varchar255-utf8-is-too-long-for-key-but-max-length-is-1000-bytes).]
A potential and easy fix would be to set ID_LEN in airflow/models/base.py to a
lower number, but would probably need to consider the potential ramifications
of doing so.
> mysql initdb failes because the primary key is too large.
> ----------------------------------------------------------
>
> Key: AIRFLOW-3786
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3786
> Project: Apache Airflow
> Issue Type: Bug
> Components: database
> Affects Versions: 1.10.2
> Reporter: Thayne McCombs
> Priority: Major
>
> When running `airflow initdb` using a mysql server using the utf8mb character
> set, I get this error:
> ```
> sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError)
> (1071, 'Specified key was too long; max key length is 767 bytes') [SQL:
> '\nCREATE TABLE dag (\n\tdag_id VARCHAR(250) NOT NULL, \n\tis_paused BOOL,
> \n\tis_subdag BOOL, \n\tis_active BOOL, \n\tlast_scheduler_run DATETIME,
> \n\tlast_pickled DATETIME, \n\tlast_expired DATETIME, \n\tscheduler_lock
> BOOL, \n\tpickle_id INTEGER, \n\tfileloc VARCHAR(2000), \n\towners
> VARCHAR(2000), \n\tPRIMARY KEY (dag_id), \n\tCHECK (is_paused IN (0, 1)),
> \n\tCHECK (is_subdag IN (0, 1)), \n\tCHECK (is_active IN (0, 1)), \n\tCHECK
> (scheduler_lock IN (0, 1))\n)\n\n'] (Background on this error at:
> [http://sqlalche.me/e/e3q8)]
> ```
> I've found a few stack overflow questions from other users that have run into
> this problem, the workarounds given are to either enable innodb_large_prefix,
> (which I can't do) or use utf8 or ascii encoding for the database (not
> desirable). Ideally, this should just work, or at the very least, have well
> documented workarounds for this problem.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)