FYI. While testing 5.7 I have found
https://issues.apache.org/jira/browse/AIRFLOW-7001 - fix is in
https://github.com/apache/airflow/pull/7641

It looks like Airflow could not work with MySQL 5.7 because it tried to
insert '2020-03-07 07:32:34.121705+00:00' format of time for TIMESTAMP
field. For 5.6 it was ok because timezone was IGNORED (!) (luckily we
always inserted it with UTC timezone I believe). For MySQL 5.7 it started
to fail with "invalid datetime format' 😱.

J.


On Mon, Mar 2, 2020 at 8:48 AM Jarek Potiuk <[email protected]>
wrote:

> Or actually we can force the charset to utf8mb3 to keep the potential to
> use most utf8 characters in the ids/keys I will test it shortly but it
> should work.
>
> On Mon, Mar 2, 2020 at 8:38 AM Jarek Potiuk <[email protected]>
> wrote:
>
>> Another option that came to me - and possibly that's the simplest and
>> best one.
>>
>> We can force charset for only the _id + key columns to be latin1. This
>> might solve all the problems at once with pretty much zero impact on the
>> rest of the system. It seems super-simple.
>>
>>
>> https://stackoverflow.com/questions/18561190/enforce-column-encoding-with-sqlalchemy
>>
>>
>>
>> J.
>>
>>
>> On Sun, Mar 1, 2020 at 9:47 PM Jarek Potiuk <[email protected]>
>> wrote:
>>
>>>
>>>> Shouldn't be a problem in practice as we delete XComs for a task before
>>>> running out again. Might happen in very rare edge casesi guess.
>>>>
>>>
>>>
>>>> Experience has taught me that if you want the mysql optimiser to do
>>>> something sensible: it won't. This may be better in newer versions, but was
>>>> still a problem in 5.5.
>>>>
>>>> Smaller index lookups from where/what? All queries from dags etc would
>>>> not use `WHERE id = ?` as you detail below.
>>>>
>>> In 2.0 there is the "huge" PK (dag_id, task_id, execution_date, key) and
>>> "biggish" secondary index (dag_id, task_id, execution_date). When you
>>> select by all four: ( dag_id, task_id, execution_date, key) - the
>>> optimizer might choose either of the two (depends on how many rows are in
>>> the table, what is the distribution of keys etc.). And my bet is that
>>> having both indexes now might "confuse" the optimiser to always choose
>>> primary key, which *Might* be slower due to its size (but this is pure
>>> speculation seeing similar things in the past). It's just a side comment -
>>> it does not have to be slower and even if it is, it will be likely
>>> negligible overall.
>>>
>>>
>>>> 512 is probably longer than it needs to be, 250 could be long enough,
>>>> or perhaps we have slightly different behaviour (column length) on mysql
>>>> and other databases?
>>>>
>>>
>>> column length is always in characters - but it translates to more bytes
>>> for utf8mb4.... too long.
>>>
>>> Rough calculation says that we should have to decrease key column to
>>> 266. And maybe that's the easiest solution ? Maybe we should simply
>>> decrease the key column size to 266 when we have MySQL and UTF8MB4 as
>>> encoding???? I think this might be a good compromise between backwards
>>> compatibility and not limiting other installations that do not have that
>>> con. I think this will mean that some DAGs might work on Postgres but not
>>> on MySQL with UTF8mb4. But it's at least something that user can deal with
>>> and fix the DAG.
>>>
>>>
>>> Another possibility would be to give a TI an integer PK, and then make
>>>> XCom (ti_id, key) (fk not required).
>>>>
>>>> I guess the hash approach has precedent in Airflow, but I'm not a huge
>>>> fan of possible collisions, however rare (and would like to work towards
>>>> removing it from the current tables.)
>>>>
>>>
>>> Yeah I do not like it either - too much. If we can live with shorter key
>>> size for MySQL - I think that might be the best solution...
>>>
>>>
>>>>
>>>>
>>>> --
>>>
>>> Jarek Potiuk
>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>
>>> M: +48 660 796 129 <+48660796129>
>>> [image: Polidea] <https://www.polidea.com/>
>>>
>>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>
>>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to