potiuk commented on code in PR #25938:
URL: https://github.com/apache/airflow/pull/25938#discussion_r954956657
##########
docs/apache-airflow/installation/upgrading.rst:
##########
@@ -55,16 +67,143 @@ Sample usage:
``airflow db upgrade --revision-range "e959f08ac86c:142555e44c17"``
-Migration best practices
-========================
+Handling migration problems
+===========================
+
+
+Wrong Encoding in MySQL database
+................................
+
+If you are using old Airflow 1.10 as a database created initially either
manually or with previous version of MySQL,
+depending on the original character set of your database, you might have
problems with migrating to a newer
+version of Airflow and your migration might fail with strange errors ("key
size too big", "missing indexes" etc).
+The next chapter describes how to fix the problem manually.
Review Comment:
I think it is not really something that is 1.10 specific. The way we treated
encoding for IDs changed over time - at some point of time it was recommended,
and at some point of time it turned into automated/mandatory and the way it
works is that I think depends more on when you do certain steps rather than
Airflow 1.10 vs Airflow 2.
It's hard to wrap your head around the possible number of scenarios users
might have here especialy when you add the fact that people could have (and
did) migrates from MySQL 5.7 to MySQL 8 somwhere bettween 1.10 and 2.4.0 which
basicallly could have changed the default encoding (from latin1/swedish to
utf8) or they could have a completely different encoding set in general.
Then - depending which version you were at when you did the migration, you
could end up with pretty much random set of intermixed encodings in your
columns. That's why I ended up with "if you have a problem - you need to end up
in this state and here are useful hints that can bring you to the right state"
rather than trying to fix it automatically. Also see the answer I gave to
@itayB in
https://github.com/apache/airflow/discussions/25866#discussioncomment-3474089
why we do not try to automate the fix :)
What we have currently is that encoding is set to uf8mb3 automatically for
those ID fields if you do not override it. So once you fix it, it should not
happen in the future.
Small rant though... With MySQL 9 or 10 this might return and hit us back
again. But then we should likely only enable Mysql 9, 10 (if we decide to) when
we test the migration scenarios. The problem with successor of MySQL 8 is that
`utf8` (which is not default encoding - utf8mb4 is) is an alias to `ut8mb3`.
But documentation of MySQL mentions that in the future versions the `utf8`
alias will point to `utf8mb4` and .... utf8mb3 will be gone. Officially
`utf8mb3` is deprecated now. I have no idea how we are going to handle this,
because this opens up a whole host of various scenarios again.
But I decided to not to worry about this. I sincerely think we should drop
MySQL at some point of time, and maybe lack of support for whatever new version
of MySQL comes (if it comes) might be a good way to handle dropping MySQL
altogether. I even think that instead of supporting MySQL 9 (or whatever) we
should develop a tool that will help people to migrate to Postgres from MySQL.
Let's see how much less of MySQL we will have in the next survey.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]