Regardless of any option, there isn't a strong case to keep it in the
Context dict -- which is why I want to separate the discussion of configs
to worker/parser etc to https://github.com/apache/airflow/issues/44352

On Wed, 11 Dec 2024 at 00:04, Kaxil Naik <kaxiln...@gmail.com> wrote:

> imo it is related but a separate discussion: the focus of this one is very
> limited to only the Task Context dictionary. This is actually good since
> there will be only one way to get configuration i.e. importing
> airflow.configuration object instead of this other way -- more so because
> there is still some way to go in determining (2) or (3)
>
> For Task SDK: we have a spike task about it here:
> https://github.com/apache/airflow/issues/44352. I have copy/pasted your
> three options there.
>
> On Tue, 10 Dec 2024 at 23:57, Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> I think that one should be preceded with another decision which we have
>> not
>> discussed and have not agreed to (and a bit of follow-up on Constance's
>> question).
>>
>> Do we expect airflow configuration to be present also at the worker and
>> should it be the same or different than the one in
>> "scheduler/webserver/triggerer" ?
>>
>> So far (Airflow 2) we assumed (and even recommended to the users -
>>
>> https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html
>> )
>> that configuration should be the same for all components.
>>
>> Which is not good for security reasons because we have sensitive
>> configuration, and users should not copy around those  config files and
>> variables.
>> IMHO we should completely remove this recommendation. And we have three
>> possible options to tell our users:.
>>
>> 1) either copy to workers only the configuration that is needed (but which
>> ones are needed and how to separate those config entries in the docs?)
>> 2) or NO configuration should be present in the worker - and all
>> configuration that is needed should be passed by Task SDK to the worker
>> 3) or we assess that task and dag file processors do not need any
>> configuration at all
>>
>> I think if we assess that 3) is the case - there is no other option but to
>> remove conf. That would also be really nice as we would not have to have
>> "conf" reading code in TaskSDK at all.
>> But if we think that tasks and dag parsing will actually need some
>> configuration, then we can go either 1) or 2) route
>>
>> I think option 2) is much better from less confusion, security and "single
>> source of truth" point of view. And if we go for 2) then `namespace =
>> conf.get("kubernetes", "NAMESPACE")` will not work because configuration
>> will not be present there and in this case what would make more sense is
>> to
>> leave "conf" as part of the context but only pass the "needed"
>> configuration there.
>> If we go for option 1) then likely yes we could remove it from context.
>>
>> J.
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Dec 10, 2024 at 7:03 PM Kaxil Naik <kaxiln...@gmail.com> wrote:
>>
>> > Yeah, this PR only remove it from accessing from task context dict i.e.
>> if
>> > you try to do: BashOperator(bash_command="{{ conf.get('webserver',
>> > 'base_url') }}" -- for example
>> >
>> > On Tue, 10 Dec 2024 at 23:27, Constance Martineau
>> > <consta...@astronomer.io.invalid> wrote:
>> >
>> > > Will it still be possible to import and use configuration directly?
>> For
>> > > example:
>> > >
>> > > ```
>> > > from airflow.configuration import conf
>> > > from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import
>> > > KubernetesPodOperator
>> > >
>> > >
>> > > namespace = conf.get("kubernetes", "NAMESPACE")
>> > >
>> > > KubernetesPodOperator(
>> > >     namespace=namespace,
>> > >     image="<your-docker-image>",
>> > >     cmds=["<commands-for-image>"],
>> > >     arguments=["<arguments-for-image>"],
>> > >     labels={"<pod-label>": "<label-name>"},
>> > >     name="<pod-name>",
>> > >     task_id="<task-name>",
>> > >     get_logs=True,
>> > >     in_cluster=True,
>> > > )
>> > > ```
>> > >
>> > > On Tue, Dec 10, 2024 at 12:52 PM Kaxil Naik <kaxiln...@gmail.com>
>> wrote:
>> > >
>> > > > Hey team,
>> > > >
>> > > > I have a PR (https://github.com/apache/airflow/pull/44820) to
>> remove
>> > > conf
>> > > > object from the Task context dictionary.
>> > > >
>> > > > This was initially added (in 2015) in response to
>> > > > https://github.com/apache/airflow/issues/168. However, we now have
>> > > > `ti.log_url` that is used for that; example usages:
>> > > >
>> > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/airflow/blob/dcd41f60f1c9b5583b49bfb49b6d85c640a2892c/airflow/models/taskinstance.py#L1362
>> > > >
>> > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/airflow/blob/dcd41f60f1c9b5583b49bfb49b6d85c640a2892c/providers/src/airflow/providers/smtp/notifications/templates/email.html#L28
>> > > >
>> > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/airflow/blob/dcd41f60f1c9b5583b49bfb49b6d85c640a2892c/docs/apache-airflow/howto/email-config.rst?plain=1#L76
>> > > >
>> > > > So, to simplify what we need to pass from the API server to the Task
>> > SDK
>> > > > (for AF 3.0), I want to simplify and remove things that aren't
>> needed.
>> > In
>> > > > this case, this is good so we don't pass/expose secrets
>> unnecessarily
>> > via
>> > > > `conf` object in the execution environment.
>> > > >
>> > > > To that end, I am calling for a LAZY CONSENSUS to remove it. If no
>> one
>> > > > objects, the vote will pass three days from now, at 6:30 pm UTC, 13
>> Dec
>> > > > 2024.
>> > > >
>> > > > Regards,
>> > > > Kaxil
>> > > >
>> > >
>> >
>>
>

Reply via email to