r-richmond commented on issue #14396:
URL: https://github.com/apache/airflow/issues/14396#issuecomment-784646194


   >Dataclass is not pythonic at all
   
   As a counter point [Dacite](https://github.com/konradhalas/dacite) is a 
python library that shows how dataclasses can be used for something like this. 
Their config dataclass is a similar object to our context and is defined as 
follows.
   
   
https://github.com/konradhalas/dacite/blob/d2206b2e4711859da0ea5862c395940f33693e80/dacite/config.py#L5-L12
   ```python
   @dataclass
   class Config:
       type_hooks: Dict[Type, Callable[[Any], Any]] = 
field(default_factory=dict)
       cast: List[Type] = field(default_factory=list)
       forward_references: Optional[Dict[str, Any]] = None
       check_types: bool = True
       strict: bool = False
       strict_unions_match: bool = False
   ```
   
   which is then consumed in their function `from_dict` 
   
https://github.com/konradhalas/dacite/blob/d2206b2e4711859da0ea5862c395940f33693e80/dacite/core.py#L34
   ```python
   def from_dict(data_class: Type[T], data: Data, config: Optional[Config] = 
None) -> T:
       """Create a data class instance from a dictionary.
       :param data_class: a data class type
       :param data: a dictionary of a input data
       :param config: a configuration of the creation process
       :return: an instance of a data class
   ```
   
   As a user it feels so much better to be able to see the config in the 
function definition and then follow the type definition straight to the config 
objects strict definition (fields, types, optionally comments). 
   
   Alternatively when a dictionary is used its harder for an end user to reason 
about `what keys are present?`, `what are the values?`, `what are the types of 
the values?`, `where do I find this information?`. (Perhaps its my own 
ignorance but it isn't obvious to me today where I can look up this info for 
airflow's context dictionary and I've tried).
   
   As a developer tools like 
[mypy](https://github.com/python/mypy/blob/538d36481526135c44b90383663eaa177cfc32e3/docs/source/additional_features.rst),
 [pydantic](https://github.com/samuelcolvin/pydantic/) & various ides 
([pycharm](https://blog.jetbrains.com/pycharm/2018/04/python-37-introducing-data-class/),
 vscode) also handle dataclasses quite well which helps prevent frustrating 
runtime errors. 
   
   
   >makes it difficult to evolve
   
   Not sure about this one.
   * To add new fields to data classes you simply add a new field to the 
dataclass definition with a default value.
   * To remove old fields you can use an ide's `find references` to remove all 
the old references and then delete the field from the dataclass.
       * Optionally, use a static analyzer like mypy before commit/merge to 
ensure you've gotten rid of all the old references
   
   Perhaps I'm still a rookie but for dictionaries it feels much harder to be 
sure you've gotten all the old key references out of the code, before you can 
safely remove it.
   
   >I think it's very pythonic to have dictionaries in Python and it brings a 
number of benefits, with little drawback
   
   I've called out a couple advantages I think dataclasses have over 
dictionaries. But I'm likely missing the dictionary advantages. Is Airflow 
using any of these advantages today?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to