[GitHub] [airflow] potiuk edited a comment on pull request #19637: Trigger DAG re-serialization after upgrade

GitBox Thu, 18 Nov 2021 01:20:05 -0800


potiuk edited a comment on pull request #19637:
URL: https://github.com/apache/airflow/pull/19637#issuecomment-972673238

Side comment. I think this is a great discussion and we should have more of
those as it lets us hear and consider different perspectives.

@uranusjr - I quite agree the swich makes no sense in it's current proposed
form - since we can have the command to reserialize after migration.

@kaxil - I hear you. And I understand your perspective of the "power
users". And while I agree maybe reserializing everything is a "bulk and heavy"
solution but maybe let's try indeed to find a better solution that will be good
for all kinds of users - not only the "power" users. We have to remember about
one thing as well. Even the "power users" have teams of "ops" people which
rotates and changes. For those "experienced" teams, a lot of knowledge and
experience in their heads leaves the team when they leave it. So if we are
helping the "new" users to deal with problems, we are also helping "new
members" of such "power ops teams". That's why I think ALSO focusing on the
needs of those inexperienced users actually helps everyone.

What I am **really** after is that we think of those users when we deal with
problems and if we know of some class of problems we should - ideally - make a
self-healing solution. If we can't - we should make sure that we give the user
good and **reassuring** message of problem and guidance how they can deall with
it. This is something we do very, very poorly I think. What we are doing right
now is we give the user a meaningless messaage that - if they are inexperienced
- scares them more than helps them. I think we should be super empathetic
towards those people.

What is really the problem I want to solve (and maybe indeed we can find a
good solution for everyone) is that what happened in the past (and I guess it
will happen even now if the user has serialization problem). The screenshots
below are from yesterday's slack conversation in `trobleshooting`:
https://apache-airflow.slack.com/archives/CCQ7EGB1P/p1637144816451900. They are
about pickling and different python version, but they illustrate the problem
very well as this is pretty much the same "kind of issue" that is self-healable
and repeats often enough that we should really consider it "common problem" and
find a good "automated" solution for it.

The message that the user had was something like that:

![image
(1)](https://user-images.githubusercontent.com/595491/142383080-912d00b1-70e0-49b7-a01c-0176d7c9ca5a.png)
![image
(2)](https://user-images.githubusercontent.com/595491/142384191-47001345-32d4-432c-8ed7-b111716b0563.png)

What do you expect user will do in this case - either experienced or not?

Yeah. The user will actually create an issue.

None of us (including the user) want to create the issue - yet we "trick"
the user into creating an issue which takes both user and maintainer time
completely needlessly. I want either:

a) (best!) airflow self-healing itself and the message reassuringly saying
"This is known issue and we are fixing it automatically. Please try in 5
minutes it will get solved"

b) (a little worse but acceptable!) the user gets clear message on what to
do - instructions providing context of the problem and enough information that
the user can make decision on their own without involving maintainers. This is
what was missing originally in the "moved_task_instance" case and what I added
[here](https://airflow.apache.org/docs/apache-airflow/stable/installation/upgrading.html#post-upgrade-warnings)
- and the "warning" message links to it.

This is a well known and often occuring situation. Seeing a cryptic error -
rather than reassuring message and helpful instuctions - creates the impression
that Airlfow is enterprise-grade software. Those kind of errors are very bad -
because there are only two options when you see it: either "I screwed up" or
"they screwed up". Both thoughts have bad consequences.

So what I am really after is answer to simple question.

Can we make sure that in case of serialization errors (and maybe we do the
same with pickling errors while we are at it) - the messages the user sees are
reassuring, and the system either self-heals or the user gets clear
instructions what they should do - including context why and ability to make
the decision on their own what to do without involving maintainers?

If the answer to this is "yes" and we implement it - then I think we do not
neeed this automated reserialization.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] potiuk edited a comment on pull request #19637: Trigger DAG re-serialization after upgrade

Reply via email to