[ 
https://issues.apache.org/jira/browse/AIRFLOW-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756959#comment-16756959
 ] 

Andy Huynh commented on AIRFLOW-3786:
-------------------------------------

Hi [~thayne2], assuming you're using utf8mb4 encoding, correct? Seems like the 
error stems from the fact that in utf8mb4 encoding, each character represents 
four bytes. The dag_id column (the primary key) is set to varchar(250), 
producing a key length of 1000 bytes which exceeds the 767 maximum. Take a look 
at the top answer here: 

_If you're using utf8mb4, and you have unique indexes on varchar columns that 
are greater than 191 characters in length, you'll need to turn on 
innodb_large_prefix to allow for larger columns in indexes, because utf8mb4 
requires more storage space than utf8 or latin1._ 

([https://stackoverflow.com/questions/6172798/mysql-varchar255-utf8-is-too-long-for-key-but-max-length-is-1000-bytes).]

A potential and easy fix would be to set ID_LEN in airflow/models/base.py to a 
lower number, but would probably need to consider the potential ramifications 
of doing so.

 

 

 

 

 

> mysql  initdb failes because the primary key is too large.
> ----------------------------------------------------------
>
>                 Key: AIRFLOW-3786
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3786
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: database
>    Affects Versions: 1.10.2
>            Reporter: Thayne McCombs
>            Priority: Major
>
> When running `airflow initdb` using a mysql server using the utf8mb character 
> set, I get this error:
> ```
>  sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) 
> (1071, 'Specified key was too long; max key length is 767 bytes') [SQL: 
> '\nCREATE TABLE dag (\n\tdag_id VARCHAR(250) NOT NULL, \n\tis_paused BOOL, 
> \n\tis_subdag BOOL, \n\tis_active BOOL, \n\tlast_scheduler_run DATETIME, 
> \n\tlast_pickled DATETIME, \n\tlast_expired DATETIME, \n\tscheduler_lock 
> BOOL, \n\tpickle_id INTEGER, \n\tfileloc VARCHAR(2000), \n\towners 
> VARCHAR(2000), \n\tPRIMARY KEY (dag_id), \n\tCHECK (is_paused IN (0, 1)), 
> \n\tCHECK (is_subdag IN (0, 1)), \n\tCHECK (is_active IN (0, 1)), \n\tCHECK 
> (scheduler_lock IN (0, 1))\n)\n\n'] (Background on this error at: 
> [http://sqlalche.me/e/e3q8)]
> ```
> I've found a few stack overflow questions from other users that have run into 
> this problem, the workarounds given are to either enable innodb_large_prefix, 
> (which I can't do) or use utf8 or ascii encoding for the database (not 
> desirable).  Ideally, this should just work, or at the very least, have well 
> documented workarounds for this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to