Regardless of any option, there isn't a strong case to keep it in the Context dict -- which is why I want to separate the discussion of configs to worker/parser etc to https://github.com/apache/airflow/issues/44352
On Wed, 11 Dec 2024 at 00:04, Kaxil Naik <kaxiln...@gmail.com> wrote: > imo it is related but a separate discussion: the focus of this one is very > limited to only the Task Context dictionary. This is actually good since > there will be only one way to get configuration i.e. importing > airflow.configuration object instead of this other way -- more so because > there is still some way to go in determining (2) or (3) > > For Task SDK: we have a spike task about it here: > https://github.com/apache/airflow/issues/44352. I have copy/pasted your > three options there. > > On Tue, 10 Dec 2024 at 23:57, Jarek Potiuk <ja...@potiuk.com> wrote: > >> I think that one should be preceded with another decision which we have >> not >> discussed and have not agreed to (and a bit of follow-up on Constance's >> question). >> >> Do we expect airflow configuration to be present also at the worker and >> should it be the same or different than the one in >> "scheduler/webserver/triggerer" ? >> >> So far (Airflow 2) we assumed (and even recommended to the users - >> >> https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html >> ) >> that configuration should be the same for all components. >> >> Which is not good for security reasons because we have sensitive >> configuration, and users should not copy around those config files and >> variables. >> IMHO we should completely remove this recommendation. And we have three >> possible options to tell our users:. >> >> 1) either copy to workers only the configuration that is needed (but which >> ones are needed and how to separate those config entries in the docs?) >> 2) or NO configuration should be present in the worker - and all >> configuration that is needed should be passed by Task SDK to the worker >> 3) or we assess that task and dag file processors do not need any >> configuration at all >> >> I think if we assess that 3) is the case - there is no other option but to >> remove conf. That would also be really nice as we would not have to have >> "conf" reading code in TaskSDK at all. >> But if we think that tasks and dag parsing will actually need some >> configuration, then we can go either 1) or 2) route >> >> I think option 2) is much better from less confusion, security and "single >> source of truth" point of view. And if we go for 2) then `namespace = >> conf.get("kubernetes", "NAMESPACE")` will not work because configuration >> will not be present there and in this case what would make more sense is >> to >> leave "conf" as part of the context but only pass the "needed" >> configuration there. >> If we go for option 1) then likely yes we could remove it from context. >> >> J. >> >> >> >> >> >> >> >> >> On Tue, Dec 10, 2024 at 7:03 PM Kaxil Naik <kaxiln...@gmail.com> wrote: >> >> > Yeah, this PR only remove it from accessing from task context dict i.e. >> if >> > you try to do: BashOperator(bash_command="{{ conf.get('webserver', >> > 'base_url') }}" -- for example >> > >> > On Tue, 10 Dec 2024 at 23:27, Constance Martineau >> > <consta...@astronomer.io.invalid> wrote: >> > >> > > Will it still be possible to import and use configuration directly? >> For >> > > example: >> > > >> > > ``` >> > > from airflow.configuration import conf >> > > from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import >> > > KubernetesPodOperator >> > > >> > > >> > > namespace = conf.get("kubernetes", "NAMESPACE") >> > > >> > > KubernetesPodOperator( >> > > namespace=namespace, >> > > image="<your-docker-image>", >> > > cmds=["<commands-for-image>"], >> > > arguments=["<arguments-for-image>"], >> > > labels={"<pod-label>": "<label-name>"}, >> > > name="<pod-name>", >> > > task_id="<task-name>", >> > > get_logs=True, >> > > in_cluster=True, >> > > ) >> > > ``` >> > > >> > > On Tue, Dec 10, 2024 at 12:52 PM Kaxil Naik <kaxiln...@gmail.com> >> wrote: >> > > >> > > > Hey team, >> > > > >> > > > I have a PR (https://github.com/apache/airflow/pull/44820) to >> remove >> > > conf >> > > > object from the Task context dictionary. >> > > > >> > > > This was initially added (in 2015) in response to >> > > > https://github.com/apache/airflow/issues/168. However, we now have >> > > > `ti.log_url` that is used for that; example usages: >> > > > >> > > > >> > > > >> > > >> > >> https://github.com/apache/airflow/blob/dcd41f60f1c9b5583b49bfb49b6d85c640a2892c/airflow/models/taskinstance.py#L1362 >> > > > >> > > > >> > > > >> > > >> > >> https://github.com/apache/airflow/blob/dcd41f60f1c9b5583b49bfb49b6d85c640a2892c/providers/src/airflow/providers/smtp/notifications/templates/email.html#L28 >> > > > >> > > > >> > > > >> > > >> > >> https://github.com/apache/airflow/blob/dcd41f60f1c9b5583b49bfb49b6d85c640a2892c/docs/apache-airflow/howto/email-config.rst?plain=1#L76 >> > > > >> > > > So, to simplify what we need to pass from the API server to the Task >> > SDK >> > > > (for AF 3.0), I want to simplify and remove things that aren't >> needed. >> > In >> > > > this case, this is good so we don't pass/expose secrets >> unnecessarily >> > via >> > > > `conf` object in the execution environment. >> > > > >> > > > To that end, I am calling for a LAZY CONSENSUS to remove it. If no >> one >> > > > objects, the vote will pass three days from now, at 6:30 pm UTC, 13 >> Dec >> > > > 2024. >> > > > >> > > > Regards, >> > > > Kaxil >> > > > >> > > >> > >> >