Another option that came to me - and possibly that's the simplest and best
one.

We can force charset for only the _id + key columns to be latin1. This
might solve all the problems at once with pretty much zero impact on the
rest of the system. It seems super-simple.

https://stackoverflow.com/questions/18561190/enforce-column-encoding-with-sqlalchemy



J.


On Sun, Mar 1, 2020 at 9:47 PM Jarek Potiuk <[email protected]>
wrote:

>
>> Shouldn't be a problem in practice as we delete XComs for a task before
>> running out again. Might happen in very rare edge casesi guess.
>>
>
>
>> Experience has taught me that if you want the mysql optimiser to do
>> something sensible: it won't. This may be better in newer versions, but was
>> still a problem in 5.5.
>>
>> Smaller index lookups from where/what? All queries from dags etc would
>> not use `WHERE id = ?` as you detail below.
>>
> In 2.0 there is the "huge" PK (dag_id, task_id, execution_date, key) and
> "biggish" secondary index (dag_id, task_id, execution_date). When you
> select by all four: ( dag_id, task_id, execution_date, key) - the
> optimizer might choose either of the two (depends on how many rows are in
> the table, what is the distribution of keys etc.). And my bet is that
> having both indexes now might "confuse" the optimiser to always choose
> primary key, which *Might* be slower due to its size (but this is pure
> speculation seeing similar things in the past). It's just a side comment -
> it does not have to be slower and even if it is, it will be likely
> negligible overall.
>
>
>> 512 is probably longer than it needs to be, 250 could be long enough, or
>> perhaps we have slightly different behaviour (column length) on mysql and
>> other databases?
>>
>
> column length is always in characters - but it translates to more bytes
> for utf8mb4.... too long.
>
> Rough calculation says that we should have to decrease key column to 266.
> And maybe that's the easiest solution ? Maybe we should simply decrease the
> key column size to 266 when we have MySQL and UTF8MB4 as encoding???? I
> think this might be a good compromise between backwards compatibility and
> not limiting other installations that do not have that con. I think this
> will mean that some DAGs might work on Postgres but not on MySQL with
> UTF8mb4. But it's at least something that user can deal with and fix the
> DAG.
>
>
> Another possibility would be to give a TI an integer PK, and then make
>> XCom (ti_id, key) (fk not required).
>>
>> I guess the hash approach has precedent in Airflow, but I'm not a huge
>> fan of possible collisions, however rare (and would like to work towards
>> removing it from the current tables.)
>>
>
> Yeah I do not like it either - too much. If we can live with shorter key
> size for MySQL - I think that might be the best solution...
>
>
>>
>>
>> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to