r-richmond commented on issue #14396: URL: https://github.com/apache/airflow/issues/14396#issuecomment-785250252
>TypedDict is far more pythonic (Hints are the way how types are usually added in Python) Hints are how dataclasses uses types as well. They aren't enforced at runtime, unless you use something like [pydantic](https://pydantic-docs.helpmanual.io). (side note [Dataclass PEP-557](https://www.python.org/dev/peps/pep-0557/) had support from Raymond & Guido which as far as I can tell gives it a lot of points in the pythonic category. Also Raymond gave a fantastic [talk on Dataclasses](https://www.youtube.com/watch?v=T-TwcmT6Rcw) that I'd recommend if you have time). >We should basically never remove/rename anything in the context (maybe we should use some deprecation mechanism). Yep makes sense to me. Although, this isn't a point for or against dataclasses, typedDicts, or dicts. >The context is used by pretty much every single custom operator out there, and the change you propose will immediately break all of them.. If anything else - this is a single reason why we should not do it. While switching to dataclasses would be harder than TypedDicts it does not have to be a breaking change. You can convert dictionaries to dataclasses and vice versa pretty seamlessly. Although, implementing Dataclasses would certainly be more work to get right in the initial pr. <details><summary>Example showing how you can switch between dataclasses & dictionaries</summary> ```Python from dataclasses import dataclass, asdict from dacite import from_dict @dataclass class Demo: id: str value: int d = Demo("a", 1) examples = [d, asdict(d), from_dict(data_class=Demo, data=asdict(d))] for example in examples: print(example, type(example)) ``` prints ``` Demo(id='a', value=1) <class '__main__.Demo'> {'id': 'a', 'value': 1} <class 'dict'> Demo(id='a', value=1) <class '__main__.Demo'> ``` </details> Personally I'm of the opinion that working with the dataclass api is superior to working with the dictionary api due to the reasons outlined above but [thats just like my opinion man](https://www.youtube.com/watch?v=1vBesOFURek). >But the idea of improving developer's productivity is sound and since we have TypedDict which is backwards compatible and gives most benefits you mentioned Either option could be backwards compatible but I agree that the TypedDict change is more straight forward and is an alternative, hence the tiny blurb about it in the opener. As you mentioned TypedDict should give most of the benefits, although to my knowledge find references does not work as well dictionary keys which is a bummer. Anyways the decision on which to use is above my Airflow status so I'll let the maintainers decide. Most importantly I think either a TypedDict or Dataclass implementation would be a big improvement over the current situation. >Would youl like to pick this up @r-richmond ? Actually, the reason I submitted this issue was because me and a coworker were frustrated with the "magic context dictionary". TLDR I'm not actually sure what all the keys, values, & types are in context so I wouldn't even know how to start this one. <details><summary>My end goal is to be able to do the following when writing callable that can have context passed to them</summary> ```Python def generate_is_latest_callable(tasks_if_latest: List[str] , tasks_if_not_latest: List[str]) -> Callable: def result(context: Context) -> List[str]: context. #and be able to get strong autocomplete and typing while in an ide here # because currently all I can do is context: Dict[str, Any] which isn't very helpful if context.something: return tasks_if_latest else: return tasks_if_not_latest return result t_branch = PythonBranchingOperator( task_id="branch", python_callable=generate_is_latest_callable(["yes"], ["no"]), provide_context=True, dag=dag, ) ``` Note: moving this goal to the opening comment </details> ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
