It's good that the discussion was started, the community won't be left in the shadows.
I agree that removal of `conf` is the right thing to do in the long run and thanks for adding the related approaches in the GH issue. Thanks & Regards, Amogh Desai On Wed, Dec 11, 2024 at 12:15 AM Jarek Potiuk <ja...@potiuk.com> wrote: > Sure. Then I am fine with removing it now. We can always re-add it before > now and later if we see a need. Or maybe it will just be added > "automatically " if we decide that context is the right way to pass the > information. > > On Tue, Dec 10, 2024 at 7:43 PM Kaxil Naik <kaxiln...@gmail.com> wrote: > > > Regardless of any option, there isn't a strong case to keep it in the > > Context dict -- which is why I want to separate the discussion of configs > > to worker/parser etc to https://github.com/apache/airflow/issues/44352 > > > > On Wed, 11 Dec 2024 at 00:04, Kaxil Naik <kaxiln...@gmail.com> wrote: > > > > > imo it is related but a separate discussion: the focus of this one is > > very > > > limited to only the Task Context dictionary. This is actually good > since > > > there will be only one way to get configuration i.e. importing > > > airflow.configuration object instead of this other way -- more so > because > > > there is still some way to go in determining (2) or (3) > > > > > > For Task SDK: we have a spike task about it here: > > > https://github.com/apache/airflow/issues/44352. I have copy/pasted > your > > > three options there. > > > > > > On Tue, 10 Dec 2024 at 23:57, Jarek Potiuk <ja...@potiuk.com> wrote: > > > > > >> I think that one should be preceded with another decision which we > have > > >> not > > >> discussed and have not agreed to (and a bit of follow-up on > Constance's > > >> question). > > >> > > >> Do we expect airflow configuration to be present also at the worker > and > > >> should it be the same or different than the one in > > >> "scheduler/webserver/triggerer" ? > > >> > > >> So far (Airflow 2) we assumed (and even recommended to the users - > > >> > > >> > > > https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html > > >> ) > > >> that configuration should be the same for all components. > > >> > > >> Which is not good for security reasons because we have sensitive > > >> configuration, and users should not copy around those config files > and > > >> variables. > > >> IMHO we should completely remove this recommendation. And we have > three > > >> possible options to tell our users:. > > >> > > >> 1) either copy to workers only the configuration that is needed (but > > which > > >> ones are needed and how to separate those config entries in the docs?) > > >> 2) or NO configuration should be present in the worker - and all > > >> configuration that is needed should be passed by Task SDK to the > worker > > >> 3) or we assess that task and dag file processors do not need any > > >> configuration at all > > >> > > >> I think if we assess that 3) is the case - there is no other option > but > > to > > >> remove conf. That would also be really nice as we would not have to > have > > >> "conf" reading code in TaskSDK at all. > > >> But if we think that tasks and dag parsing will actually need some > > >> configuration, then we can go either 1) or 2) route > > >> > > >> I think option 2) is much better from less confusion, security and > > "single > > >> source of truth" point of view. And if we go for 2) then `namespace = > > >> conf.get("kubernetes", "NAMESPACE")` will not work because > configuration > > >> will not be present there and in this case what would make more sense > is > > >> to > > >> leave "conf" as part of the context but only pass the "needed" > > >> configuration there. > > >> If we go for option 1) then likely yes we could remove it from > context. > > >> > > >> J. > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> On Tue, Dec 10, 2024 at 7:03 PM Kaxil Naik <kaxiln...@gmail.com> > wrote: > > >> > > >> > Yeah, this PR only remove it from accessing from task context dict > > i.e. > > >> if > > >> > you try to do: BashOperator(bash_command="{{ conf.get('webserver', > > >> > 'base_url') }}" -- for example > > >> > > > >> > On Tue, 10 Dec 2024 at 23:27, Constance Martineau > > >> > <consta...@astronomer.io.invalid> wrote: > > >> > > > >> > > Will it still be possible to import and use configuration > directly? > > >> For > > >> > > example: > > >> > > > > >> > > ``` > > >> > > from airflow.configuration import conf > > >> > > from airflow.providers.cncf.kubernetes.operators.kubernetes_pod > > import > > >> > > KubernetesPodOperator > > >> > > > > >> > > > > >> > > namespace = conf.get("kubernetes", "NAMESPACE") > > >> > > > > >> > > KubernetesPodOperator( > > >> > > namespace=namespace, > > >> > > image="<your-docker-image>", > > >> > > cmds=["<commands-for-image>"], > > >> > > arguments=["<arguments-for-image>"], > > >> > > labels={"<pod-label>": "<label-name>"}, > > >> > > name="<pod-name>", > > >> > > task_id="<task-name>", > > >> > > get_logs=True, > > >> > > in_cluster=True, > > >> > > ) > > >> > > ``` > > >> > > > > >> > > On Tue, Dec 10, 2024 at 12:52 PM Kaxil Naik <kaxiln...@gmail.com> > > >> wrote: > > >> > > > > >> > > > Hey team, > > >> > > > > > >> > > > I have a PR (https://github.com/apache/airflow/pull/44820) to > > >> remove > > >> > > conf > > >> > > > object from the Task context dictionary. > > >> > > > > > >> > > > This was initially added (in 2015) in response to > > >> > > > https://github.com/apache/airflow/issues/168. However, we now > > have > > >> > > > `ti.log_url` that is used for that; example usages: > > >> > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > > https://github.com/apache/airflow/blob/dcd41f60f1c9b5583b49bfb49b6d85c640a2892c/airflow/models/taskinstance.py#L1362 > > >> > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > > https://github.com/apache/airflow/blob/dcd41f60f1c9b5583b49bfb49b6d85c640a2892c/providers/src/airflow/providers/smtp/notifications/templates/email.html#L28 > > >> > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > > https://github.com/apache/airflow/blob/dcd41f60f1c9b5583b49bfb49b6d85c640a2892c/docs/apache-airflow/howto/email-config.rst?plain=1#L76 > > >> > > > > > >> > > > So, to simplify what we need to pass from the API server to the > > Task > > >> > SDK > > >> > > > (for AF 3.0), I want to simplify and remove things that aren't > > >> needed. > > >> > In > > >> > > > this case, this is good so we don't pass/expose secrets > > >> unnecessarily > > >> > via > > >> > > > `conf` object in the execution environment. > > >> > > > > > >> > > > To that end, I am calling for a LAZY CONSENSUS to remove it. If > no > > >> one > > >> > > > objects, the vote will pass three days from now, at 6:30 pm UTC, > > 13 > > >> Dec > > >> > > > 2024. > > >> > > > > > >> > > > Regards, > > >> > > > Kaxil > > >> > > > > > >> > > > > >> > > > >> > > > > > >