Yep I think we are all converging. I **think** the context manager is good (and I saw it initially in the doc from Daniel) and I tend to agree this (or similar) syntactic sugar will be the way people will interact with setup/teardown.
I personally believe there are two slightly independent streams here a) how dag authors will interact with setup/teardown to express them in the DAG b) how it is modelled in our DAG structure/task relations I think Daniel's "modification" of the AIP is really about b) - in terms of re-using a lot of what we currently have in terms of triggering rules and task relations to model the setup/teardown behaviour where what Ash is concerned about (I am a little concerned too) is a) -> how dag authors will write their dags (and yes I imagine they will use the context managers in vast majority of cases). But I think both approaches can be combined. I personally think we should encourage the context manager use case for regular use, but leave the direct manipulation of dependencies for power users as well. I am also in favour of cutting short and just failing DAGs that have dangling setup/teardown to cut some confusion and implicitness it brings in case it happens in the power-user scenario. This - I think should be a versatile and future proof (but deliverable in a rather short time) approach and I tend to agree with Daniel that multi-setup/multi-teardown will make the feature much more useful and we will need it anyway. Also by piggybacking on the existing relations/trigger rule mechanism we might get it implemented much faster. BTW. I also think that this is much more straightforward approach: with setup_teardown(my_setup, my_teardown): my_work >> my_other_work or even that: with setup_teardown([my_setup1, my_setup2], [my_teardown]): my_work >> my_other_work J. On Fri, Mar 24, 2023 at 5:28 PM Daniel Standish <daniel.stand...@astronomer.io.invalid> wrote: > > Just want to thank you Ash for seriously engaging with the proposal and > trying to find a solution to your concerns. I am optimistic that we can > find common ground and get this feature out there. > > OK so with that dag example we looked at very similar examples. I'd love > it if we could do this (and it's mentioned in the... very long proposal) > > chain( > create_notification_channel.as_setup(), > enable_notification_channel.as_setup(), > disable_notification_channel, > list_notification_channel, > create_alert_policy.as_setup(), > enable_alert_policy.as_setup(), > disable_alert_policy, > list_alert_policies, > delete_notification_channel.teardown_for(create_notification_channel), > delete_notification_channel_2.teardown_for(enable_notification_channel), > delete_alert_policy.teardown_for(create_alert_policy), > delete_alert_policy_2.teardown_for(enable_alert_policy), > ) > > So basically you could take an existing dag, not have to refactor it at > all, and just convert the relevant tasks to be setup / teardown. > > Re context managers, we also considered this and included it in the > proposal ("other ideas under consideration") and I think the idea is > definitely promising. > > The example in the doc is this: > > with setup_teardown(my_setup, my_teardown): > my_work >> my_other_work > > Now, I'm not sure it's practical to forbid users from wiring things up > manually, though we could explore that. I would be in favor of encouraging > but not requiring. But if requiring is a compromise that will get us over > the hump then maybe that's good enough, because we could always consider > removing the constraint at a future date. > > So basically what you're saying is let's disallow `setup1 >> work` directly > and require it be defined with a context manager... Would have to come up > with something for unmatched setup / unmatched teardown. > > I'm not sure about this syntax though: > with create_notification_channel >> [ > delete_notification_channel, > delete_notification_channel_2, > ]: > > Seems it might not always be obvious where to insert the "work"... > > Crucially I want us to not let perfect be the enemy of good, and all this > > confusion and discussion is exactly why I had originally placed "multiple > > setup/teardown" in future work. Having a single setup function and a single > > task group gives our users so much more power than they have right now. > > I want this released, and keeping it simpler means we get it out sooner. > > The last thing I want is a repeat of the DAG versioning API that Kaxil and > > I tried where we got blocked. > > > I understand the goal of deferring multiple setup / teardown, i.e. being > incremental and MVP and all that. But the thing is, I think it's > absolutely essential that our choices do not cause trouble for multiple > setup / teardowns in the future. So even if we postpone enablement of > multiple, or put it behind an experimental flag, I do not think we can > postpone *consideration* of multiple -- before we release this I think we > need to know roughly how we're going to enable that or else the feature is > doomed. We can't just say we'll deal with it later and paint ourselves > into a corner. That's why I have laid out a solution with a vision for how > we do multiple. Maybe there's a better solution out there and if there is > I am happy to support it > > > > > > > > > On Fri, Mar 24, 2023 at 8:12 AM Ash Berlin-Taylor <a...@apache.org> wrote: > > > Okay, after chatting with TP a bit this morning (due to being easy for me > > to grab on Slack in an overlapping timezone) I think I've realised our > > disconnect. > > > > We both want explicit, but what we see as explicit is different! > > > > To me, the explicit was "you've entered a task group" (or a DAG, because > > all DAGs have a root task group) -- and that was why I think setup and > > teardown should "bookend" the TG. And I think scope is an important concept > > to introduce for setup and teardown to not be footguns for users learning > > and living with this feature. > > > > TP came up with an alternative suggestion idea that I'm exploring here: > > Let's make the scope explicit by requring setup/teardown tasks to be used > > in a context manager! > > > > My main issue with using a dependendy based approach is exactly the > > complex system test dags -- it's not explicit in the dep chain which tasks > > are special and which are normal. Picking a system test from the Google > > provider at random: > > > > > > https://github.com/apache/airflow/blob/main/tests/system/providers/google/cloud/stackdriver/example_stackdriver.py#L213-L226 > > > > chain( > > create_notification_channel, > > enable_notification_channel, > > disable_notification_channel, > > list_notification_channel, > > create_alert_policy, > > enable_alert_policy, > > disable_alert_policy, > > list_alert_policies, > > delete_notification_channel, > > delete_notification_channel_2, > > delete_alert_policy, > > delete_alert_policy_2, > > ) > > We can guess which of those have special treatment based on the name, but > > that's implicit to me. I can't look at that and know which tasks have > > special behaviour, nor which tasks actually need the "resources" created by > > a setup tsk. I like having a distinct call out in the python code that a > > task is special. > > > > I think this would be much clearer and explicit as something like this: > > > > with create_notification_channel >> [ > > delete_notification_channel, > > delete_notification_channel_2, > > ]: > > ( > > enable_notification_channel > > >> disable_notification_channel > > >> list_notification_channel > > ) > > with create_alert_policy >> [delete_alert_policy, delete_alert_policy_2]: > > enable_alert_policy >> disable_alert_policy >> list_alert_policies > > > > I think that is what the tasks actually need for this dag. Even if it's > > not, I think this is a useful illustrative example. Having the explicit > > with scoping in the DAG file to match the runtime behaviour is a strong > > property that I think is important such that users can understand the > > "lifetime" of resources created by tasks. > > Here's another example, lets say the tasks need both resources, but the > > resources don't actually depend on each other: > > with create_notification_channel >> [ > > delete_notification_channel, > > delete_notification_channel_2, > > ], create_alert_policy >> [delete_alert_policy, delete_alert_policy_2]: > > ( > > enable_notification_channel > > >> disable_notification_channel > > >> list_notification_channel > > ) > > enable_alert_policy >> disable_alert_policy >> list_alert_policies > > > > *** > > The key thing for me: setup and teardown tasks are not normal tasks, and > > shouldn't be used, nor appear, as such in DAG code. > > *** > > > > Crucially I want us to not let perfect be the enemy of good, and all this > > confusion and discussion is exactly why I had originally placed "multiple > > setup/teardown" in future work. Having a single setup function and a single > > task group gives our users so much more power than they have right now. > > I want this released, and keeping it simpler means we get it out sooner. > > The last thing I want is a repeat of the DAG versioning API that Kaxil and > > I tried where we got blocked. > > -ash > > On Mar 24 2023, at 12:24 am, Daniel Standish > > <daniel.stand...@astronomer.io.INVALID> wrote: > > > re > > > > > > > 2. `task1 >> task2 >> teardown_task` to me falsely implies that > > teardown > > > > depends on task2, But it doesn't. It only depends on the "scope being > > > > exited". > > > > > > > > > So that's not quite the case. With the proposed implementation, there's > > no > > > such scope concept. They're just normal tasks, with special features. > > > With the above code, teardown_task does depend on task2. A teardown will > > > run if its setup is successful and its non-setup upstreams are done. > > > > > > re this one: > > > with TaskGroup("tg1"): > > > > task1 >> teardown_task > > > > task2 >> task3 >> task4 >> task5 > > > > > > > > > With the proposed implementation, teardown runs after task 1 and doesn't > > > wait for task2 or its downstreams. And the same rule applies. Teardown > > > will run if its setups are successful and non-setup upstreams are done. > > In > > > this case there's no setup so it is in effect all_done. > > > > > > And one might ask, what's the purpose / benefit of having a teardown > > here? > > > Well, it's just that task1 requires the teardown and the others don't. > > And, > > > as a teardown, (1) even if it is happens to be a dag leaf, it can be > > > ignored for the purpose of dag run state and (2) teardowns are ignored > > as a > > > task group leaf when arrowing tg1 >> next_task so that its success is not > > > required for the dag to continue running after task1 is done. > > > > > > Adding a setup to the example doesn't change too much: > > > with TaskGroup("tg1"): > > > setup1 >> task1 >> teardown_task > > > setup1 >> teardown_task > > > task2 >> task3 >> task4 >> task5 > > > > > > So in this case we still just have two parallel sequences of tasks. > > > teardown_task will run if setup1 is successful and task1 is done. > > > > > > task2 and downstreams don't care about task 1 and its setups / teardowns. > > > They are very much normal tasks that operate in the same way but you get > > > the power of the special behaviors re clearing, configurable dag run > > state > > > impact, and continuing downstream in spite of failure. > > > > > > If task2 etc require the setup then you just add the arrows. > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Mar 23, 2023 at 4:12 PM Ash Berlin-Taylor <a...@apache.org> > > wrote: > > > > I'm obviously in favour of the way the AIP was written, and that's for > > two > > > > primary reasons. > > > > > > > > 1. It's analogous to setup and teardown in testing frameworks where you > > > > don't ever explicitly call them - the framework handles it for you. > > > > 2. `task1 >> task2 >> teardown_task` to me falsely implies that > > teardown > > > > depends on task2, But it doesn't. It only depends on the "scope being > > > > exited". > > > > > > > > And as for the Zen of Python point: python itself doesn't even follow > > them > > > > well. There are three ways of formatting strings in python. > > > > > > > > On thinking a bit more about it, I think I have a counter point to > > where I > > > > think explicit dependencies lead to a false expectation: > > > > > > > > ``` > > > > with TaskGroup("tg1"): > > > > task1 ≥≥ teardown_task > > > > > > > > task2 >> task3 >> task4 >> task5 > > > > ``` > > > > > > > > Does teardown run as soon as task 1 is finished, or when all of task1 > > and > > > > task5 are finished? > > > > > > > > I very strongly believe that teardown should only run at the end of a > > > > TaskGroup - a hard rule on this makes it easier for users to reason > > about > > > > and understand it. If it's only as a result of it's explicit > > dependencies > > > > then it means users have to reason about when each teardown task is > > run in > > > > each situation as it might be different from dag to dag. > > > > > > > > In this case the teardown is akin to a "finally" block in python, and > > the > > > > TaskGroup is the "try" block, which I hope is a concept that almost > > > > everyone writing DAGs will understand and be able to relate too. > > > > > > > > Teardown tasks are already special in a number of ways (clearing > > > > behaviour, special failure rules for resulting dag run, different > > trigger > > > > rule) so users need to know how it works. > > > > > > > > So I vote for keeping it implicit only, but if we as a community favour > > > > explicit only then we need to have an enforced requirement that there > > only > > > > leaves of a TG can be teardown (if they are in use) -- i.e. `[taak1, > > task5] > > > > >> teardown_task` would be required in this case. > > > > > > > > (And a similar role for set up. If there are any, the only root tasks > > in a > > > > TG must be set up) > > > > > > > > Ash > > > > > > > > On 23 March 2023 22:16:42 GMT, Pierre Jeambrun <pierrejb...@gmail.com> > > > > wrote: > > > > >I am also in favor of explicit relationships only. > > > > > > > > > >From a person who didn't work on AIP-52, it seems easier to understand > > > > what > > > > >is going on without having to dive into the setup/teardown > > documentation. > > > > > > > > > >Le jeu. 23 mars 2023 à 22:53, Jed Cunningham < > > jedcunning...@apache.org> a > > > > >écrit : > > > > > > > > > >> I've been working closely with Daniel on AIP-52 for a while now, but > > > > I'll > > > > >> still share my thoughts here. > > > > >> > > > > >> I'm also in favor of only supporting explicit relationships. > > > > >> > > > > >> In my opinion, even if multiple setup/teardown per scope never > > > > materialize, > > > > >> explicit relationships are still a better choice due to the clarity > > it > > > > >> brings today. > > > > >> > > > > > > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org