[GitHub] [airflow] potiuk commented on a diff in pull request #25938: Add instructions on manually fixing MySQL Charset problems

GitBox Thu, 25 Aug 2022 06:18:19 -0700


potiuk commented on code in PR #25938:
URL: https://github.com/apache/airflow/pull/25938#discussion_r954956657



##########
docs/apache-airflow/installation/upgrading.rst:
##########
@@ -55,16 +67,143 @@ Sample usage:
    ``airflow db upgrade --revision-range "e959f08ac86c:142555e44c17"``
 
 
-Migration best practices
-========================
+Handling migration problems
+===========================
+
+
+Wrong Encoding in MySQL database
+................................
+
+If you are using old Airflow 1.10 as a database created initially either 
manually or with previous version of MySQL,
+depending on the original character set of your database, you might have 
problems with migrating to a newer
+version of Airflow and your migration might fail with strange errors ("key 
size too big", "missing indexes" etc).
+The next chapter describes how to fix the problem manually.

Review Comment:
   I think it is not really something that is 1.10 specific. The way we treated 
encoding for IDs changed over time - at some point of time it was recommended, 
and at some point of time it turned into automated/mandatory and the way it 
works is that I think depends more on when you do certain steps rather than 
Airflow 1.10 vs Airflow 2.
   
   It's hard to wrap your head around the possible number of scenarios users 
might have here especialy when you add the fact that people could have (and 
did) migrates from MySQL  5.7 to MySQL 8 somwhere bettween 1.10 and 2.4.0 which 
basicallly could have changed the default encoding (from latin1/swedish to 
utf8)  or they could have a completely different encoding set in general. 
   
   Then  - depending which version you were at when you did the migration, you 
could end up with pretty much random set of intermixed encodings in your 
columns. That's why I ended up with "if you have a problem - you need to end up 
in this state and here are useful hints that can bring you to the right state" 
rather than trying to fix it automatically. Also see the answer I gave to 
@itayB in 
https://github.com/apache/airflow/discussions/25866#discussioncomment-3474089 
why we do not try to automate the fix :)
   
   What we have currently is that encoding is set to uf8mb3 automatically for 
those ID fields if you do not override it. So once you fix it, it should not 
happen in the future.
   
   Small rant though... With MySQL 9 or 10 this might return and hit us back 
again. But then we should likely only enable Mysql 9, 10 (if we decide to) when 
we test the migration scenarios. The problem with successor of MySQL 8 is that 
`utf8` (which is not default encoding - utf8mb4 is) is an alias to `ut8mb3`. 
But documentation of MySQL mentions that in the future versions the `utf8` 
alias will point to `utf8mb4` and .... utf8mb3 will be gone. Officially 
`utf8mb3` is deprecated now. I have no idea how we are going to handle this, 
because this opens up a whole host of various scenarios again.
   
   But I decided to not to worry about this. I sincerely think we should drop 
MySQL at some point of time, and maybe lack of support for whatever new version 
of MySQL comes (if it comes) might be a good way to handle dropping MySQL 
altogether. I even think that instead of supporting MySQL 9 (or whatever) we 
should develop a tool that will help people to migrate to Postgres from MySQL. 
   
   Let's see how much less of MySQL we will have in the next survey.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] potiuk commented on a diff in pull request #25938: Add instructions on manually fixing MySQL Charset problems

Reply via email to