potiuk edited a comment on pull request #19637: URL: https://github.com/apache/airflow/pull/19637#issuecomment-972673238
Side comment. I think this is a great discussion and we should have more of those as it lets us hear and consider different perspectives. @uranusjr - I quite agree the swich makes no sense in it's current proposed form - since we can have the command to reserialize after migration. @kaxil - I hear you. And I understand your perspective of the "power users". And while I agree maybe reserializing everything is a "bulk and heavy" solution but maybe let's try indeed to find a better solution that will be good for all kinds of users - not only the "power" users. We have to remember about one thing as well. Even the "power users" have teams of "ops" people which rotates and changes. For those "experienced" teams, a lot of knowledge and experience in their heads leaves the team when they leave it. So if we are helping the "new" users to deal with problems, we are also helping "new members" of such "power ops teams". That's why I think ALSO focusing on the needs of those inexperienced users actually helps everyone. What I am **really** after is that we think of those users when we deal with problems and if we know of some class of problems we should - ideally - make a self-healing solution. If we can't - we should make sure that we give the user good and **reassuring** message of problem and guidance how they can deall with it. This is something we do very, very poorly I think. What we are doing right now is we give the user a meaningless messaage that - if they are inexperienced - scares them more than helps them. I think we should be super empathetic towards those people. What is really the problem I want to solve (and maybe indeed we can find a good solution for everyone) is that what happened in the past (and I guess it will happen even now if the user has serialization problem). The screenshots below are from yesterday's slack conversation in `trobleshooting`: https://apache-airflow.slack.com/archives/CCQ7EGB1P/p1637144816451900. They are about pickling and different python version, but they illustrate the problem very well as this is pretty much the same "kind of issue" that is self-healable and repeats often enough that we should really consider it "common problem" and find a good "automated" solution for it. The message that the user had was something like that:   What do you expect user will do in this case - either experienced or not? Yeah. The user will actually create an issue. None of us (including the user) want to create the issue - yet we "trick" the user into creating an issue which takes both user and maintainer time completely needlessly. I want either: a) (best!) airflow self-healing itself and the message reassuringly saying "This is known issue and we are fixing it automatically. Please try in 5 minutes it will get solved" b) (a little worse but acceptable!) the user gets clear message on what to do - instructions providing context of the problem and enough information that the user can make decision on their own without involving maintainers. This is what was missing originally in the "moved_task_instance" case and what I added [here](https://airflow.apache.org/docs/apache-airflow/stable/installation/upgrading.html#post-upgrade-warnings) - and the "warning" message links to it. This is a well known and often occuring situation. Seeing a cryptic error - rather than reassuring message and helpful instuctions - creates the impression that Airlfow is enterprise-grade software. Those kind of errors are very bad - because there are only two options when you see it: either "I screwed up" or "they screwed up". Both thoughts have bad consequences. So what I am really after is answer to simple question. Can we make sure that in case of serialization errors (and maybe we do the same with pickling errors while we are at it) - the messages the user sees are reassuring, and the system either self-heals or the user gets clear instructions what they should do - including context why and ability to make the decision on their own what to do without involving maintainers? If the answer to this is "yes" and we implement it - then I think we do not neeed this automated reserialization. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
