potiuk commented on PR #38015: URL: https://github.com/apache/airflow/pull/38015#issuecomment-2041517914
I think a little bit more context is needed on that one, becuase this is a follow-up and really fix of another problem introdued in 2.8 in https://github.com/apache/airflow/pull/33430 (where we allowed context params to not have defaults at all). This change is really codifying what has been explained for quite some time - since 2.7.0. I think we explicitly explained that `| None` should be used for typing From https://airflow.apache.org/docs/apache-airflow/2.8.0/core-concepts/taskflow.html#context : ```python @task def print_ti_info(task_instance: TaskInstance | None = None, dag_run: DagRun | None = None): print(f"Run ID: {task_instance.run_id}") # Run ID: scheduled__2023-08-09T00:00:00+00:00 print(f"Duration: {task_instance.duration}") # Duration: 0.972019 print(f"DAG Run queued at: {dag_run.queued_at}") # 2023-08-10 00:00:01+02:20 ``` This is how we explain that we expect None in those parameters used by the context. That's the "canonical" way I think we suggest people doing it since 2.7. In 2.8 https://github.com/apache/airflow/pull/33430 introduced the possibility of "NOT" having the defaults. Technically speaking - `params = dag.params` is just a `mypy` hack, because the value set as default will be anyhow overwritten with the params from context. And it might be misleading because people my expect it to be set. And it's a hack that you can do for only subset of arguments - like params in a reasonable/easy way IMHO. Say - for TaskInstance or DagRun - how would you provide defaults there? I think limiting only to params being set by dag.params as default is even more hack'y and confusing in this context. There is a class of difficult to spot errors that is avoided with this fix - for example this will not work as you think it will work (if you don't know that run_id will be injected by task_flow): ```python @task def print_ti_info(some_value="x", run_id: str = "my_run_id"): print(some_value) print(run_id) print_ti_info(): ``` In example above you might simply miss that the context param has the `run_id` parameter injected, there are other name clashes possible too - especially for task names overriding the context names accidentally. There is quite some ambiguity today and this change is going to address it. And there is another case - it's not `academic` problem either actually (See https://github.com/apache/airflow/issues/38006) the change was not really implemented "because we thought it is a good idea" - it is solving a real issue our user had - when https://github.com/apache/airflow/pull/33430 was added - for some of our users they could see "default value cannot follow non-default one) - introduced by https://github.com/apache/airflow/pull/33430 - where we actually allowed context values not to provide defaults at all. This was when user used a parameter (`logical_date`) that was also accidentally a context parameter. So it detected a mistake the user could miss before. However yes - I agree that this change might have much bigger effect than we initially anticipated. Others **could** have used it for similar purpose to yours. We might want to make some other follow-ups after that if we see that this is a problem for others. The case you have @jscheffl is really a hack, but one that is likely to be used. I don't think it is strong enough reason to make RC4 - don't think so, while I did not get approval from @uranusjr and @Taragolis - we all thought it's a reasonable one, I think (I'd love to hear your thought here as well). But answering the typing question - one of the solutions (and a simple one) is: ```python @task def print_ti_info(params: ParamsDict | None = None, task_instance: TaskInstance | None = None, dag_run: DagRun | None = None): assert params and task_instance and dag_run ... ``` A bit ugly, yes, but will work, and I personally think it's much more "straightforward" than smth like that: ```python def print_ti_info(params: ParamsDict = dag.param, task_instance: TaskInstance = TaskInstance(), dag_run: DagRun = DagRun()): ... ``` Ideally (and maybe that should be a follow-up) we should be able to have (and mybe even publish) Airflow MyPy plugin - someone even requested one for default args #38796 recently https://github.com/apache/airflow/tree/main/dev/mypy/plugin (the ones we have work with default args and outputs). Maybe it is possible - we could write a plugin that could lilely be used to also allow this case (though @uranusjr in https://github.com/apache/airflow/pull/33430 hinted that it might be impossible). ```python def print_ti_info(params: ParamsDict, task_instance: TaskInstance, dag_run: DagRun): .... print_ti_info() ``` And maybe that's the best idea to actually do so for anyone who wants to use MyPy in this case? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
