I am testing now 5.7 in 1.10 branch with the PR here:
https://github.com/apache/airflow/pull/7569 and it seems it's an easy
one one - seems it will "just work".

However while testing it (I analysed a customer problem) I have found
that we have an issue in 2.0 with MySQL (not only 5.7 - it's the same
for 5.6 and even 8.0 (!). It boils down to  an index (at least one) is
to big for the recommended uf8mb4 encoding:

ALTER TABLE xcom ADD CONSTRAINT pk_xcom PRIMARY KEY (dag_id, task_id,
`key`, execution_date).

The utf8mb4 encoding is needed to handle 4-byte UTF characters (mostly
emojis but also some Chinese characters).

The question is what do we do with that. I think we have two options.
Not sure what was the story behind increasing the lengths (apparently
the problem does not

a) decrease sizes of some of the fields (that would make backwards
incompatibility)
b) partial indexes of the fields (not really - this is primary key so
I guess it's not a good idea, although MYSQL supports indexing only
part of the string - we could cut key uniqueness at 200 chars for
example - but again, backwards incompatibility is a problem)
c) revert to id - generated primary kay and make secondary index
without the key - there are usually not many xcoms per task, so I
would not expect any problems - actually I would expect this will
speed up searches because the index will be much, much smaller (50%
more or less). I think this is the most viable option.

Any other ideas?

I described all details in this issue:
https://issues.apache.org/jira/browse/AIRFLOW-6947

Here is similar story of somone who had similar issue
https://serversforhackers.com/c/mysql-utf8-and-indexing

J.


Jarek Potiuk
Polidea | Principal Software Engineer

M: +48 660 796 129

Reply via email to