FYI. While testing 5.7 I have found https://issues.apache.org/jira/browse/AIRFLOW-7001 - fix is in https://github.com/apache/airflow/pull/7641
It looks like Airflow could not work with MySQL 5.7 because it tried to insert '2020-03-07 07:32:34.121705+00:00' format of time for TIMESTAMP field. For 5.6 it was ok because timezone was IGNORED (!) (luckily we always inserted it with UTC timezone I believe). For MySQL 5.7 it started to fail with "invalid datetime format' 😱. J. On Mon, Mar 2, 2020 at 8:48 AM Jarek Potiuk <[email protected]> wrote: > Or actually we can force the charset to utf8mb3 to keep the potential to > use most utf8 characters in the ids/keys I will test it shortly but it > should work. > > On Mon, Mar 2, 2020 at 8:38 AM Jarek Potiuk <[email protected]> > wrote: > >> Another option that came to me - and possibly that's the simplest and >> best one. >> >> We can force charset for only the _id + key columns to be latin1. This >> might solve all the problems at once with pretty much zero impact on the >> rest of the system. It seems super-simple. >> >> >> https://stackoverflow.com/questions/18561190/enforce-column-encoding-with-sqlalchemy >> >> >> >> J. >> >> >> On Sun, Mar 1, 2020 at 9:47 PM Jarek Potiuk <[email protected]> >> wrote: >> >>> >>>> Shouldn't be a problem in practice as we delete XComs for a task before >>>> running out again. Might happen in very rare edge casesi guess. >>>> >>> >>> >>>> Experience has taught me that if you want the mysql optimiser to do >>>> something sensible: it won't. This may be better in newer versions, but was >>>> still a problem in 5.5. >>>> >>>> Smaller index lookups from where/what? All queries from dags etc would >>>> not use `WHERE id = ?` as you detail below. >>>> >>> In 2.0 there is the "huge" PK (dag_id, task_id, execution_date, key) and >>> "biggish" secondary index (dag_id, task_id, execution_date). When you >>> select by all four: ( dag_id, task_id, execution_date, key) - the >>> optimizer might choose either of the two (depends on how many rows are in >>> the table, what is the distribution of keys etc.). And my bet is that >>> having both indexes now might "confuse" the optimiser to always choose >>> primary key, which *Might* be slower due to its size (but this is pure >>> speculation seeing similar things in the past). It's just a side comment - >>> it does not have to be slower and even if it is, it will be likely >>> negligible overall. >>> >>> >>>> 512 is probably longer than it needs to be, 250 could be long enough, >>>> or perhaps we have slightly different behaviour (column length) on mysql >>>> and other databases? >>>> >>> >>> column length is always in characters - but it translates to more bytes >>> for utf8mb4.... too long. >>> >>> Rough calculation says that we should have to decrease key column to >>> 266. And maybe that's the easiest solution ? Maybe we should simply >>> decrease the key column size to 266 when we have MySQL and UTF8MB4 as >>> encoding???? I think this might be a good compromise between backwards >>> compatibility and not limiting other installations that do not have that >>> con. I think this will mean that some DAGs might work on Postgres but not >>> on MySQL with UTF8mb4. But it's at least something that user can deal with >>> and fix the DAG. >>> >>> >>> Another possibility would be to give a TI an integer PK, and then make >>>> XCom (ti_id, key) (fk not required). >>>> >>>> I guess the hash approach has precedent in Airflow, but I'm not a huge >>>> fan of possible collisions, however rare (and would like to work towards >>>> removing it from the current tables.) >>>> >>> >>> Yeah I do not like it either - too much. If we can live with shorter key >>> size for MySQL - I think that might be the best solution... >>> >>> >>>> >>>> >>>> -- >>> >>> Jarek Potiuk >>> Polidea <https://www.polidea.com/> | Principal Software Engineer >>> >>> M: +48 660 796 129 <+48660796129> >>> [image: Polidea] <https://www.polidea.com/> >>> >>> >> >> -- >> >> Jarek Potiuk >> Polidea <https://www.polidea.com/> | Principal Software Engineer >> >> M: +48 660 796 129 <+48660796129> >> [image: Polidea] <https://www.polidea.com/> >> >> > > -- > > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] <https://www.polidea.com/> > > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>
