I like the idea. I‘d propose to write-up an AIP so that details of the concept can be discussed before raising a PR if you feel details might need to be aligned.
My concern primarily would be that if somebody uses this and uses external data or parameters from DB that this could bring-down performance or reduce availability for scheduling because the code might be called often. Bit this should be just documented and sit in users/admin responsibility. Sent from Outlook for iOS<https://aka.ms/o0ukef> ________________________________ From: Pierre Jeambrun <pierrejb...@gmail.com> Sent: Friday, February 2, 2024 11:26:47 PM To: dev@airflow.apache.org <dev@airflow.apache.org> Subject: Re: Idea for Discussion: custom TI dependencies I thought that it would allow users to code their own rules and provide them to the tasks somehow. If this is restricted to admin or deployment managers through plugins or other mechanism, I guess there is no issue here. Thanks for answering my concerns. On Fri 2 Feb 2024 at 22:53, Constance Martineau <consta...@astronomer.io.invalid> wrote: > Not missing anything. It is mainly used for deferrable operators, but feels > like an acceptable place to run user-defined/non-Airflow code portion of > this, and carry out the custom dependency checks. Assuming of course that > these checks are meant to check external conditions and will never need > access to Airflow's DB. Once the condition is met, then the scheduler could > carry on scheduling the task. > > I like the idea, and it was just a thought on how to get around the > security concern Pierre brought up. > > On Fri, Feb 2, 2024 at 4:35 PM Xiaodong (XD) DENG > <xd.d...@apple.com.invalid> > wrote: > > > Thanks both for your inputs! > > > > > > Hi Pierre, > > > > I think the key difference here is: by doing this, we are not allowing > > Airflow “users” to run their code in scheduler. We are only allowing > > Airflow “Admins” to deploy a plugin to run in scheduler, just the same as > > dag_policy/task_policy/task_instance_mutation_hook/pod_mutation_hook. > > > > So I do not think this violates our current preference in terms of > > security. > > > > > > Hi Constance, > > > > I thought the trigger is mainly for deferrable operator cases? It’s quite > > different scenario from what I’m trying to cover here IMHO. > > Did I miss anything? Please let me know. > > > > > > Thanks again! Looking forward to more questions/comments! > > > > > > XD > > > > > > > On Feb 2, 2024, at 13:29, Constance Martineau <consta...@astronomer.io > .INVALID> > > wrote: > > > > > > Naive question: Instead of running the code on the scheduler - could > the > > > condition check be delegated to the triggerer? > > > > > > On Fri, Feb 2, 2024 at 2:33 PM Pierre Jeambrun <pierrejb...@gmail.com> > > > wrote: > > > > > >> But maybe it’s time to reconsider that :), curious to see what others > > >> think. > > >> > > >> On Fri 2 Feb 2024 at 20:30, Pierre Jeambrun <pierrejb...@gmail.com> > > wrote: > > >> > > >>> I like the idea and I understand that it might help in some use > cases. > > >>> > > >>> The first concern that I have is that it would allow user code to run > > in > > >>> the scheduler, if I understand correctly. This would have big > > >> implications > > >>> in terms of security and how our security model works. (For instance > > the > > >>> scheduler is a trusted component and has direct access to the DB, > > AIP-44 > > >>> assumption) > > >>> > > >>> If I remember correctly this is a route that we specifically tried to > > >> stay > > >>> away from. > > >>> > > >>> On Fri 2 Feb 2024 at 20:03, Xiaodong (XD) DENG > > <xd.d...@apple.com.invalid > > >>> > > >>> wrote: > > >>> > > >>>> Hi folks, > > >>>> > > >>>> I’m writing to share my thought regarding the possibility of > > supporting > > >>>> “custom TI dependencies”. > > >>>> > > >>>> Currently we maintain the dependency check rules under > > >>>> “airflow.ti_deps.deps". They cover the dependency checks like if > there > > >> are > > >>>> available pool slot/if the concurrency allows/TI trigger rules/if > the > > >> state > > >>>> is valid, etc., and play essential role in the scheduling process. > > >>>> > > >>>> One idea was brought up in our team's internal discussion: why > > shouldn’t > > >>>> we support custom TI dependencies? > > >>>> > > >>>> In details: just like the cluster policies > > >>>> > > (dag_policy/task_policy/task_instance_mutation_hook/pod_mutation_hook), > > >> if > > >>>> we support users add their own dependency checks as custom classes > > (and > > >>>> also put under airflow_local_settings.py), it will allow users to > have > > >> much > > >>>> higher flexibility in the TI scheduling. These custom TI deps should > > be > > >>>> added as additions to the existing default deps (not replacing or > > >> removing > > >>>> any of them). > > >>>> > > >>>> For example: similar to check for pool availability/concurrency, the > > job > > >>>> may need to check for user’s infra-specific conditions, like if a > GPU > > is > > >>>> available right now (instead of competing with other jobs randomly), > > or > > >> if > > >>>> an external system API is ready to be called (otherwise wait a bit > ). > > >> And a > > >>>> lot more other possibilities. > > >>>> > > >>>> Why cluster policies won’t help here? task_instance_mutation_hook > is > > >>>> executed in a “worker”, not in the DAG file processor, just before > the > > >> TI > > >>>> is executed. What we are trying to gain some control here, though, > is > > in > > >>>> the scheduling process (based on custom rules, to decide if the TI > > state > > >>>> should be updated so it can be scheduled for execution). > > >>>> > > >>>> I would love to know how community finds this idea, before we start > to > > >>>> implement anything. Any quesiton/suggestion would be greatly > > >> appreciated. > > >>>> Many thanks! > > >>>> > > >>>> > > >>>> XD > > >>>> > > >>>> > > >>>> > > >> > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > For additional commands, e-mail: dev-h...@airflow.apache.org > > > > >