potiuk edited a comment on pull request #19637:
URL: https://github.com/apache/airflow/pull/19637#issuecomment-972673238


   Side comment. I think this is a great discussion and we should have more of 
those as it lets us hear and consider different perspectives.
   
   @uranusjr - I quite agree the swich makes no sense in it's current proposed 
form - since we can have the command to reserialize after migration.
   
   @kaxil  - I hear you. And I understand your perspective of the "power 
users". And while I agree maybe reserializing everything is a "bulk and heavy" 
solution but maybe let's try indeed to find a better solution that will be good 
for all kinds of users - not only the "power" users. We have to remember about 
one thing as well. Even the "power users" have teams of "ops" people which 
rotates and changes. For those "experienced" teams, a lot of knowledge and 
experience in their heads leaves the team when they leave it. So if we are 
helping the "new" users to deal with problems, we are also helping "new 
members" of such "power ops teams". That's why I think ALSO focusing on the 
needs of those inexperienced users actually helps everyone. 
   
   What I am **really** after is that we think of those users when we deal with 
problems and if we know of some class of problems we should - ideally - make a 
self-healing solution. If we can't - we should make sure that we give the user 
good and **reassuring** message of problem and guidance how they can deall with 
it. This is something we do very, very poorly I think. What we are doing right 
now is we give the user a meaningless messaage that - if they are inexperienced 
- scares them more than helps them. I think we should be super empathetic 
towards those people. 
   
   What is really the problem I want to solve (and maybe indeed we can find a 
good solution for everyone) is that what happened in the past (and I guess it 
will happen even now if the user has serialization problem).  The screenshots 
below are from yesterday's slack conversation in `trobleshooting`: 
https://apache-airflow.slack.com/archives/CCQ7EGB1P/p1637144816451900. They are 
about pickling and different python version, but they illustrate the problem 
very well as this is pretty much the same "kind of issue" that is self-healable 
and repeats often enough that we should really consider it "common problem" and 
find a good "automated" solution for it.
   
   The message that the user had was something like that:
   
   ![image 
(1)](https://user-images.githubusercontent.com/595491/142383080-912d00b1-70e0-49b7-a01c-0176d7c9ca5a.png)
   ![image 
(2)](https://user-images.githubusercontent.com/595491/142384191-47001345-32d4-432c-8ed7-b111716b0563.png)
   
   What do you expect user will do in this case - either experienced or not? 
   
   Yeah. The user will actually create an issue. 
   
   None of us (including the user) want to create the issue - yet we "trick" 
the user into creating an issue which takes both user and maintainer time 
completely needlessly. I want either:
   
   a) (best!) airflow self-healing itself and the message reassuringly saying 
"This is known issue and we are fixing it automatically. Please try again in 5 
minutes and it will be solved. If not, then <create an issue>"
   
   b) (a little worse but acceptable!) the user gets clear message on what to 
do - instructions providing context of the problem and enough information that 
the user can make decision on their own without involving maintainers. This is 
what was missing originally in the "moved_task_instance" case and what I added 
[here](https://airflow.apache.org/docs/apache-airflow/stable/installation/upgrading.html#post-upgrade-warnings)
 - and the "warning" message links to it. 
   
   This is a well known and often occuring situation. Seeing a cryptic error  - 
rather than reassuring message and helpful instuctions - creates the impression 
that Airlfow is enterprise-grade software. Those kind of errors are very bad - 
because there are only two options when you see it: either "I screwed up" or 
"they screwed up". Both thoughts have bad consequences.
   
   So what I am really after is answer to simple question.
   
   Can we make sure that in case of serialization errors (and maybe we do the 
same with pickling errors while we are at it) - the messages the user sees are 
reassuring, and the system either self-heals or the user gets clear 
instructions what they should do - including context why and ability to make 
the decision on their own what to do without involving maintainers?
   
   If the answer to this is "yes" and we implement it - then I think we do not 
neeed this automated reserialization.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to