eskarimov commented on a change in pull request #19835:
URL: https://github.com/apache/airflow/pull/19835#discussion_r757859224



##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -64,10 +66,12 @@
 class RunState:
     """Utility class for the run state concept of Databricks runs."""
 
-    def __init__(self, life_cycle_state: str, result_state: str, 
state_message: str) -> None:
+    def __init__(
+        self, life_cycle_state: str, state_message: str, result_state: str = 
None, *args, **kwargs

Review comment:
       It needs to be reviewed together with the changes for initialising the 
class instance out of API response:
   
   Current version:
   ```python
           state = response['state']
           life_cycle_state = state['life_cycle_state']
           # result_state may not be in the state if not terminal
           result_state = state.get('result_state', None)
           state_message = state['state_message']
           return RunState(life_cycle_state, result_state, state_message)
   ```
   Proposed version:
   ```python
           state = response['state']
           return RunState(**state)
   ```
   
   Current version is basically an intermediate layer between the API response 
and class, extracting values out of the API response and initialising class 
instance. But actually the response should already represent a state, why do we 
need this layer then?
   I see the following drawbacks with it:
   - Class signature doesn't tell that `result_state` might be missing if state 
is not terminal. Currently it's described with the comment deep in the code.
   - It tends to increase repeating code - let's say we want to introduce async 
class for `DatabricksHook`. This logic needs to be written twice. Also in case 
we want to change the class in the future, let's say add new property 
`user_cancelled_or_timedout` (which is already a part of the API response), 
then we need to change class arguments, parsing response logic and class 
instance initialisation everywhere it's used.
     With the proposed version, we only need to change class arguments.
   
   With all the above, answering the questions:
   - > why do we need *args, **kwargs ?
         
        It shows that RunState might receive other init arguments (since we 
don't have control over API response), see above example with 
`user_cancelled_or_timedout` in the response.
   
   - > why to change order of the parameters? They are logical now: lifecycle 
-> result state -> state message.
   
        Just because of Python syntax, we need to put arguments with default 
values after required arguments.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to