potiuk commented on PR #38015:
URL: https://github.com/apache/airflow/pull/38015#issuecomment-2041517914

   I think a little bit more context is needed on that one, becuase this is a 
follow-up and really fix of another problem introdued in 2.8 in 
https://github.com/apache/airflow/pull/33430  (where we allowed context params 
to not have defaults at all).
   
   This change is really codifying what has been explained for quite some time 
- since 2.7.0. I think we explicitly explained that `| None` should be used for 
typing
   
   From 
https://airflow.apache.org/docs/apache-airflow/2.8.0/core-concepts/taskflow.html#context
 :
   
   ```python
   @task
   def print_ti_info(task_instance: TaskInstance | None = None, dag_run: DagRun 
| None = None):
       print(f"Run ID: {task_instance.run_id}")  # Run ID: 
scheduled__2023-08-09T00:00:00+00:00
       print(f"Duration: {task_instance.duration}")  # Duration: 0.972019
       print(f"DAG Run queued at: {dag_run.queued_at}")  # 2023-08-10 
00:00:01+02:20
   ```
   
   This is how we explain that we expect None in those parameters used by the 
context. That's the "canonical" way I think we suggest people doing it since 
2.7. In 2.8 https://github.com/apache/airflow/pull/33430 introduced the 
possibility of "NOT" having the defaults.
   
   Technically speaking - `params = dag.params` is just a `mypy` hack, because 
the value set as default will be anyhow overwritten with the params from 
context.  And it might be misleading because people my expect it to be set. And 
it's a hack that you can do for only subset of arguments - like params  in a 
reasonable/easy way IMHO. 
   
   Say - for TaskInstance or DagRun - how would you provide defaults there? I 
think limiting only to params being set by dag.params as default is even more 
hack'y and confusing in this context.
   
   There is a class of difficult to spot errors that is avoided with this fix - 
for example this will not work as you think it will work (if you don't know 
that run_id will be injected by task_flow):
   
   ```python
   @task
   def print_ti_info(some_value="x", run_id: str = "my_run_id"):
        print(some_value)
        print(run_id)
   
   print_ti_info():
   ```
   
   In example above you might simply miss that the context param has the 
`run_id` parameter injected, there are other name clashes possible too - 
especially for task names overriding the context names accidentally. There is 
quite some ambiguity today and this change is going to address it.
   
   And there is another case - it's not `academic` problem either actually (See 
https://github.com/apache/airflow/issues/38006)  the change was not really 
implemented "because we thought it is a good idea" -  it is solving a real 
issue our user had -  when https://github.com/apache/airflow/pull/33430 was 
added - for some of our users they could see "default value cannot follow 
non-default one) - introduced by https://github.com/apache/airflow/pull/33430 - 
where we actually allowed context values not to provide defaults at all. This 
was when user used a parameter (`logical_date`) that was also accidentally a 
context parameter. So it detected a mistake the user could miss before.
   
   However yes - I agree that this change might have much bigger effect than we 
initially anticipated. Others **could** have used it for similar purpose to 
yours. We might want to make some other follow-ups after that if we see that 
this is a problem for others.
   
   The case you have @jscheffl  is really a hack, but one that is likely to be 
used.  I don't think it is strong enough reason to make RC4 - don't think so, 
while I did not get approval from @uranusjr and @Taragolis - we all thought 
it's a reasonable one, I think (I'd love to hear your thought here as well).
   
   But answering the typing question - one of the solutions (and a simple one) 
is:
   
   ```python
   @task
   def print_ti_info(params: ParamsDict | None = None, task_instance: 
TaskInstance | None = None, dag_run: DagRun | None = None):
       assert params and task_instance and dag_run
       ...
   ```
   
   A bit ugly, yes, but will work, and I personally think it's much more 
"straightforward" than smth like that:
   
   ```python
   def print_ti_info(params: ParamsDict = dag.param, task_instance: 
TaskInstance = TaskInstance(), dag_run: DagRun = DagRun()):
        ...
   ```
   
   Ideally (and maybe that should be a follow-up) we should be able to have 
(and mybe even publish) Airflow MyPy plugin - someone even requested one for 
default args #38796 recently 
https://github.com/apache/airflow/tree/main/dev/mypy/plugin (the ones we have 
work with default args and outputs).  Maybe it is possible - we could write a 
plugin that could lilely be used to also allow this case (though @uranusjr in 
https://github.com/apache/airflow/pull/33430 hinted that it might be 
impossible).
   
   ```python
   def print_ti_info(params: ParamsDict, task_instance: TaskInstance, dag_run: 
DagRun):
       ....
   
   print_ti_info()
   ```
   
   And maybe that's the best idea to actually do so for anyone who wants to use 
MyPy in this case?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to