eskarimov commented on a change in pull request #19835:
URL: https://github.com/apache/airflow/pull/19835#discussion_r757859224
##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -64,10 +66,12 @@
class RunState:
"""Utility class for the run state concept of Databricks runs."""
- def __init__(self, life_cycle_state: str, result_state: str,
state_message: str) -> None:
+ def __init__(
+ self, life_cycle_state: str, state_message: str, result_state: str =
None, *args, **kwargs
Review comment:
It needs to be reviewed together with the changes for initialising the
class instance out of API response:
Current version:
```python
state = response['state']
life_cycle_state = state['life_cycle_state']
# result_state may not be in the state if not terminal
result_state = state.get('result_state', None)
state_message = state['state_message']
return RunState(life_cycle_state, result_state, state_message)
```
Proposed version:
```python
state = response['state']
return RunState(**state)
```
Current version is basically an intermediate layer between the API response
and class, extracting values out of the API response and initialising class
instance. But actually the response should already represent a state, why do we
need this layer then?
I see the following drawbacks with it:
- Class signature doesn't tell that `result_state` might be missing if state
is not terminal. Currently it's described with the comment deep in the code.
- It tends to increase repeating code - let's say we want to introduce async
class for `DatabricksHook`. This logic needs to be written twice. Also in case
we want to change the class in the future, let's say add new property
`user_cancelled_or_timedout` (which is already a part of the API response),
then we need to change class arguments, parsing response logic and class
instance initialisation everywhere it's used.
With the proposed version, we only need to change class arguments.
With all the above, answering the questions:
- > why do we need *args, **kwargs ?
It shows that RunState might receive other init arguments (since we
don't have control over API response), see above example with
`user_cancelled_or_timedout` in the response.
- > why to change order of the parameters? They are logical now: lifecycle
-> result state -> state message.
Just because of Python syntax, we need to put arguments with default
values after required arguments.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]