imo it is related but a separate discussion: the focus of this one is very
limited to only the Task Context dictionary. This is actually good since
there will be only one way to get configuration i.e. importing
airflow.configuration object instead of this other way -- more so because
there is still some way to go in determining (2) or (3)

For Task SDK: we have a spike task about it here:
https://github.com/apache/airflow/issues/44352. I have copy/pasted your
three options there.

On Tue, 10 Dec 2024 at 23:57, Jarek Potiuk <ja...@potiuk.com> wrote:

> I think that one should be preceded with another decision which we have not
> discussed and have not agreed to (and a bit of follow-up on Constance's
> question).
>
> Do we expect airflow configuration to be present also at the worker and
> should it be the same or different than the one in
> "scheduler/webserver/triggerer" ?
>
> So far (Airflow 2) we assumed (and even recommended to the users -
>
> https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html
> )
> that configuration should be the same for all components.
>
> Which is not good for security reasons because we have sensitive
> configuration, and users should not copy around those  config files and
> variables.
> IMHO we should completely remove this recommendation. And we have three
> possible options to tell our users:.
>
> 1) either copy to workers only the configuration that is needed (but which
> ones are needed and how to separate those config entries in the docs?)
> 2) or NO configuration should be present in the worker - and all
> configuration that is needed should be passed by Task SDK to the worker
> 3) or we assess that task and dag file processors do not need any
> configuration at all
>
> I think if we assess that 3) is the case - there is no other option but to
> remove conf. That would also be really nice as we would not have to have
> "conf" reading code in TaskSDK at all.
> But if we think that tasks and dag parsing will actually need some
> configuration, then we can go either 1) or 2) route
>
> I think option 2) is much better from less confusion, security and "single
> source of truth" point of view. And if we go for 2) then `namespace =
> conf.get("kubernetes", "NAMESPACE")` will not work because configuration
> will not be present there and in this case what would make more sense is to
> leave "conf" as part of the context but only pass the "needed"
> configuration there.
> If we go for option 1) then likely yes we could remove it from context.
>
> J.
>
>
>
>
>
>
>
>
> On Tue, Dec 10, 2024 at 7:03 PM Kaxil Naik <kaxiln...@gmail.com> wrote:
>
> > Yeah, this PR only remove it from accessing from task context dict i.e.
> if
> > you try to do: BashOperator(bash_command="{{ conf.get('webserver',
> > 'base_url') }}" -- for example
> >
> > On Tue, 10 Dec 2024 at 23:27, Constance Martineau
> > <consta...@astronomer.io.invalid> wrote:
> >
> > > Will it still be possible to import and use configuration directly? For
> > > example:
> > >
> > > ```
> > > from airflow.configuration import conf
> > > from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import
> > > KubernetesPodOperator
> > >
> > >
> > > namespace = conf.get("kubernetes", "NAMESPACE")
> > >
> > > KubernetesPodOperator(
> > >     namespace=namespace,
> > >     image="<your-docker-image>",
> > >     cmds=["<commands-for-image>"],
> > >     arguments=["<arguments-for-image>"],
> > >     labels={"<pod-label>": "<label-name>"},
> > >     name="<pod-name>",
> > >     task_id="<task-name>",
> > >     get_logs=True,
> > >     in_cluster=True,
> > > )
> > > ```
> > >
> > > On Tue, Dec 10, 2024 at 12:52 PM Kaxil Naik <kaxiln...@gmail.com>
> wrote:
> > >
> > > > Hey team,
> > > >
> > > > I have a PR (https://github.com/apache/airflow/pull/44820) to remove
> > > conf
> > > > object from the Task context dictionary.
> > > >
> > > > This was initially added (in 2015) in response to
> > > > https://github.com/apache/airflow/issues/168. However, we now have
> > > > `ti.log_url` that is used for that; example usages:
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/apache/airflow/blob/dcd41f60f1c9b5583b49bfb49b6d85c640a2892c/airflow/models/taskinstance.py#L1362
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/apache/airflow/blob/dcd41f60f1c9b5583b49bfb49b6d85c640a2892c/providers/src/airflow/providers/smtp/notifications/templates/email.html#L28
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/apache/airflow/blob/dcd41f60f1c9b5583b49bfb49b6d85c640a2892c/docs/apache-airflow/howto/email-config.rst?plain=1#L76
> > > >
> > > > So, to simplify what we need to pass from the API server to the Task
> > SDK
> > > > (for AF 3.0), I want to simplify and remove things that aren't
> needed.
> > In
> > > > this case, this is good so we don't pass/expose secrets unnecessarily
> > via
> > > > `conf` object in the execution environment.
> > > >
> > > > To that end, I am calling for a LAZY CONSENSUS to remove it. If no
> one
> > > > objects, the vote will pass three days from now, at 6:30 pm UTC, 13
> Dec
> > > > 2024.
> > > >
> > > > Regards,
> > > > Kaxil
> > > >
> > >
> >
>

Reply via email to