potiuk commented on issue #14396:
URL: https://github.com/apache/airflow/issues/14396#issuecomment-962588693


   > Looking at maybe taking this on. I also had a good bit of "what the heck 
is even IN `context` and what's it for?" when I first started looking at the 
codebase, so I can see the benefit to new contributors and it'll be a good one 
to learn, myself.
   
   Cool!
   
   I think we have to make sure that we know what can be achieved here and what 
the "real" benefits of context stronger naming and typing is. Just adding 
typying without being able to make good use of it is kinda useless.  I.e. where 
and for whom actually "more typing and autocompletable naming" in context would 
help. 
   
   Let me summarize my view on what the requirements here are and who are the 
"users" of each requirement. What we are talking about here is to make easier 
to "write" dags and a bit easier to read them as `context.ti` is much more 
readable than `context.['ti']`.  We could also thing about some kind of 
verification and flagging typos. 
   
   * old way of accessing the context should work without any changes 
(context['ti'] should continue to work, also the "wrongly_typoed" context 
parameters should behave according to `template_undefined` in DAG as in usual 
Jinja context)  - this is hard requirement which we cannot drop.
   
   * auto-complete should include hints from the docstrings that explain the 
meaning of parameters (some of the dates require longer explanation) - this 
would be the single most important "improvement" we can get from this and 
without it, changing the context has very little improvement because for new 
users you'd have to anyhow look-it-up separately, for old users - they know the 
context field by heart already anyway, so auto-complete is just small 
improvement for them.
   
   * the documentation about the context 
https://github.com/apache/airflow/blob/main/docs/apache-airflow/templates-ref.rst
 should be generated automatically from the code. We do not want to maintain 
those separately  - the "code" should be single source of truth for the 
documentation. For me this is also a "hard" requirement that we cannot drop.
   
   * auto-complete should work in the source code when you are referring 
context directly in "classic operator's execute" - this would help those who 
write new operators.
   
   * The `typing` part in the context used this way works with MyPy type checks 
(I am working on bringing it back in the new version after temporary disabling 
it now).  This would prevent a number of problems for people writing dags and 
mistaking types.
   
   * Ideally (but that might be more difficutl) the context fields should be 
auto-completeable when you use `@task` decorator. In `@task` decorator you can 
pass context fields as parameter  of the task, it would be great if we can find 
a way to get context fields to be also auto-completable there. This would be 
super-helpful for those who write DAGs following the "task flow" way (which is 
somethig that I think we all agree is the "default" in the futuere - we are 
converting all examples to follow it and I believe it produces much cleaner, 
less-boilerplate code. This would be fantastic if we can get it working (and if 
MyPy checks would work here as well). Not sure though if this is achievable.
   
   * Also it would also be good if we could achieve auto-complete in jinja 
statements. That is a "nice-to-have" and - if possible - would likely be 
IDE-specific, but might be a very nice addition. Though it is really useful 
only for users who use the "classic" way of developing DAGs which we move away 
from, so I think this is not crucial and we can easily skip that one.
   
   > 1. Would the ideal solution be to make a new module under `airflow.models` 
to define the Context TypedDict and import it into `taskinstance.py`, or just 
define it at the top of `airflow.models.taskinstance`?
   
   I am not sure whether TypedDict is the right solution here, if we look at 
the above list. I'd rather say that a dedicated Context class which mixes-in 
the dict and field access would be a better solution in this case. When it 
comes to docstrings and explanation of the fields, and auto-complete in @task 
decorator, I am not sure TypedDict is a good solution. But that needs some 
exploring I think.
   
   > 2. If/once I do define the TypedDict, are any other changes needed in 
order to implement it?   I have a rough implementation locally already and it 
seems to pass CI without any further changes, but want to make sure I'm not 
underestimating the scope of this.
   
   As above - I am not sure how many users will benefit from "TypedDict" 
approach. I belive making a dedicated Context class with mix-in like approach 
woudl serve more cases from the list above.
   
   > 3. Context appears to contain some other Dicts (`conf` and `var`, 
specifically), should I chase that rabbit and define those as well while I am 
at it?
   
   * var for sure not - vars depend on what is in the DB or Secret Manager so 
it will be next to impossible to know the variables without those.
   * conf - that's more possible. We alreday have a well defined, automatically 
updated docs for conf variables, and this woudl be a nice addition to be able 
to auto-complete/doc all the conf variables there. It's not crucial, but since 
you are at it, it might be good to implement if easy.
    
   > 4. It looks like we use both `datetime.datetime` and `pendulum.DateTime` 
in various places, which would we prefer to use here in the Context definition 
for the various timestamp fields?  [[ [Discussing 
here](https://apache-airflow.slack.com/archives/CCPRP7943/p1635954964150000) ]]
   
   I think pendulum.DateTime 
(https://airflow.apache.org/docs/apache-airflow/stable/templates-ref.html?highlight=next_ds#variables).
 I think we should look at the code and see what kind of "datetime" we have 
there already. But that is one of the very good side-effects of this change  - 
maybe MyPy will be able to detect when we mix the datetime's  and warn us here?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to