TLDR
* changing handling of try_number in
https://github.com/apache/airflow/pull/39336
* no more private attr
* no more getter that changes value based on state of task
* no more decrementing
* try number now only handled by scheduler
* hope that sounds good to all of you

For more detail read on...

In https://github.com/apache/airflow/pull/39336 I am doing some work to
resolve some longstanding pain and frustration caused by try_number.

The way we handle try_number has for quite some time been messy and
problematic.

For example, if you access `ti.try_number` and then change the state to or
from RUNNING, you will get a different value if you access it again!

And the responsibility for managing this number has been distributed
throughout the codebase.  For example the task itself always increments
when it starts running.  But then if it defers or reschedules itself, it
decrements it back down so that when it runs again and naively increments,
then it will be right again.

Recently more issues have become visible as I have worked with AIP-44
because for example pydantic does not like private attrs and it's just
awkward to know *what value to use* when serializing it when the TI will
give you a different answer depending on the state of the task!

And there's yet another edge case being solved in this community PR
<https://github.com/apache/airflow/pull/38984#issuecomment-2090944403>.
 And then when we start looking at try history and AIP-64, it also forces a
look at this.

So it all sounds bad and indeed it is bad but I think I have a solution.

What I do is, just have the scheduler increment try_number at the moment
when it schedules the task.  It alone will have the responsibility for
incrementing try_number.  And no where will it ever be decremented.  It
will not be incremented when resuming after deferral or reschedule.  And
that's about all there is to it.

I've tested it out and it works.  But I'm working through many test
failures that need to be resolved (there's lots of asserts re try_number).

One small thing I just want to point out is that if a user were previously
to be doing `task.run()` sort of manually without the task having been
scheduled by the scheduler, well now their try_number won't be
automatically incremented.  Same if they just do `airflow tasks run` --
because now the responsibility is going to be solely with the scheduler.
But airflow was never designed to assume that tasks will be run without
having been scheduled, so I do not think that counts as a breaking change.
So I don't think that's a blocker for this.

Thanks for the consideration.  Let me know if you have any concerns.

Reply via email to