potiuk commented on issue #25144:
URL: https://github.com/apache/airflow/issues/25144#issuecomment-1190089091

   BTW. The limited index size in MySQL is a bummer. This is an extremely bad 
design choice especially connected with the fact that the actual size taken by 
the index depends on collation.
   
   Sorry for the RANT, but I am so fed-up with the problems MySQL caused use 
because of they bad design choices and wrong (IMHO) decisions that it is a 
blueprint of "how not to treat your users". And whenever I see someone smart 
who understands how it all works I will use all the strenght I have to convince 
them to dump MySQL because their decision make their users to suffer (and in 
our case - our users and us).
   
   When I first learned (2 years ago when it caused us the problem) was when a 
user wanted to use 🦅 in the id of the DAG. This is extended UTF-8 set and 
required utf8mb4 and it was not possible - precisely because suddenly when you 
used utf8mb4 the index was to big on (back then) MySQL 5.6. And well - this is 
a very legitimat And then it got worse - if you look at how collaction charset 
approach changes in MySQL 5.7. 8 and possibly in the future - this is nothing 
but madness.
   
   5.6: default charset in 5.6 was .... Latin2 and collation is Swedish. This 
is what - unknowingly by the admins  - is by far biggest charset/collation 
MySQL database have. Yep. Swedish. BTW. Back then even if indexes of MySQL 5.6 
were smaller, this was not a problem because the index for latin2 has huge 
capacity. If you wanted to change to utf8 you were immediately cutting it down 
- which basically means that definition of your database had to take into 
account which encoding/collation should be used. And what's worse - utf8mb3 or 
mb4 is designed in the way that it requires 1,2,3 (or 4) characters - Why on 
earth the limit for index for utf8mb3 is 1/3 of latin-2? and utf8mb4 ? You can 
fit the same number of utf8mb*  characters in the same space as latin-2 if you 
do not use special characters. UTF8 was DESIGNED for that. It was supposed to 
take exactly the same amount of space as ASCII for ASCII characters. Why on 
earth MySQL decided that - no matter which characters you have you can
  only fit 1/3 of them in the index? Beats me. It must have been for 
performance, but it is a terrible choice for unsuspected user who - if they 
want to go utf route has to basically redesign their database. 
   
   And (as you noticed) max index size of your InnoDB storage is different, 
depending on your database configuration. What you might also not be aware - 
the size of the index can go down depending on the page size you choose for the 
DB. So generally speaking if you want to have design of schema that will work 
for all the different charset, page sizes, COMPACT setting etc. then your index 
has to have ..... bear with me ..... "some small-ish size". This is not defined 
what exactly is "Small enough". The index size in 767 (strange round number 
isn't it???)  Is it 300 characters ? maybe, in some cases of latin2, but if you 
use utf8mb4, COMPACT and small page size it is WAY to much . worst possible 
case and your index size cannot be ~ 80-ish characters or so (or maybe less, 
don't know). And remember  - that if you have unique columns or foreign keys, 
where mutliple columns are are involved, the size of the index is COMBINED size 
of the columns. If you want to combine for columns, then - e
 ach of them has to have > 20 characters. And if you want to add more, then, 
well, you are out of luck. Good? Not really.
   
   But the really BAD thing is that when you create your schema - you do not 
KNOW what the limit is. It is arbitrary based on decisions of the deployment 
side.
   
   What's more - I am not sure if you know, but you can specify different 
encoding/collation not only for the database, but also different for a schema, 
table and different for the column (the last one BTW is a trick we managed to 
achieve Utf8mb4 databases with our index - we default ALL ID columns to be 
utf8mb3 if utf8mb4 is used for the database). But what's more you can also 
specifiy encoding and collation on the client (WHAT?) which defaults to what is 
your LANGUAAGE on the client (????)  and BAD things happen if those two 
encodings/collations do not agree with each other.
   
   5.7: This was apparently noticed they increased the size of index to 3072 
(Andther round number). But all the problems remained: 
   > If you reduce the InnoDB [page 
size](https://dev.mysql.com/doc/refman/5.7/en/glossary.html#glos_page_size) to 
8KB or 4KB by specifying the 
[innodb_page_size](https://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html#sysvar_innodb_page_size)
 option when creating the MySQL instance, the maximum length of the index key 
is lowered proportionally, based on the limit of 3072 bytes for a 16KB page 
size. That is, the maximum index key length is 1536 bytes when the page size is 
8KB, and 768 bytes when the page size is 4KB.
   
   How on earth shoudl I decide on the column size when I am designing my 
database???????  For database - basicallly they force me to use autoincremented 
ids as primary keys. This was maybe good idea in 1980s - but this is an 
antipattern now and it does not really help when you want to have unique 
indexes on the actual fields..... because this apply to ALL indexes  -not only 
primary keys. How do I make sure my long text column is unique together with 
another long text column? Should I write a TRIGGER to protect against accidenal 
entering of non-unique value ?
   
   But the real fun thing starts with 8. Bear with me. They had a chance to fix 
it all and they screwed up even more. Instead of simplifying it, they 
complicated it even further and made even more unobvious choices and 
deliberately forward-incompatible ones.
   
   8: Default encoding for 8 utf8mb4 and collation utf8mb4_0900_ai_ci. But all 
the limits did not change. This means that when previously your DB schema 
installation worked on 5.7 no "stock" mysql (latin2) - they suddenly might stop 
working on 8 (because effective size of the index decreased). For us it means 
that if we did not use the (utf8mb3 id collation trick) - users who previously 
installed Airlfow on MySQL 5.7 with default settings (vast majo rity of people 
do not change the encoding/collation and use default) suddenly it would stop 
working. This is why actually we implemented the trick. Because suddenly people 
started to raise issues that Airflow cannot be installed on MySQL 8. Very nice 
"feature"- thank you Oracle. But this is nothing yet. They actually built-in 
another trap for their unsuspected users (and us). If you are a bit "smarter 
admin" and in the past you actually did the right thing and chose "utf8" as a 
character set (sounds legit like?) then ... Bear with me. In 
 MySQL 8 you end up with..... utf8mb3 ... WHAT? Default is 'utf8mb4', but 
'utf8' is actually utf8mb3 .... But this is not the worst part. The worst part 
is deeply hidden in the documentation.
   
   https://dev.mysql.com/doc/refman/8.0/en/charset-unicode.html
   
   > utf8: An alias for utf8mb3. In MySQL 8.0, this alias is deprecated; use 
utf8mb4 instead. utf8 is expected in a future release to become an alias for 
utf8mb4.
   
   Let that sink in for a while. .......
   
   There are no plans to increase the index size in the future versions. And 
what this basically means that if you use 'utf8' as your character set and 
migrate to MySql 10 (assuming this will be 10) - then, yes, you guessed it , 
Airflow database will stop working because suddenly the same utf8 you had 
before will have smaller indexes to use.
   
   This is a trap. Again. 
   
   ----------------
   
   End of a RANT.
   
   Sorry @pingzh but - I lost  a lot of hair because of that already and 
whenever I can I try to convince everyone - if you only can, switch to Postgres.
   
   BTW. You know that Postgres has virtually no limits on sizes of the indexes. 
do you? 
   
   > There is no PostgreSQL-imposed limit on the number of indexes you can 
create on a table. Of course, performance may degrade if you choose to create 
more and more indexes on a table with more and more columns. PostgreSQL has a 
limit of 1GB for the size of any one field in a table.
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to