dstandish edited a comment on issue #20398:
URL: https://github.com/apache/airflow/issues/20398#issuecomment-998894798


   My take is that generally speaking _hooks_ should not manage state like 
this; instead, that is the job of the thing using the hook (be it an operator 
or a function as in `@task`).
   
   When it's managed by the caller then you don't  have side effects and hidden 
behavior that you need to remember or document.  When there's a good reason 
fine but when not I'd say it's better not to.
   
   In that bigquery **hook** example there's not much benefit in setting that 
attribute.
   
   Instead  one can manage it in the operator:
   
   ```python
   running_job_id = hook.run_query(...)
   ```
   
   In the **operator** there can certainly be legitimate reasons for setting an 
instance attribute like this because it can be used in `on_kill`.   XCom is 
totally different issue and certainly no complaints there.  My concern is just 
with the hook.
   
   And with hooks I wouldn't necessarily say it should _never_ be done.  We did 
it 
[here](https://github.com/apache/airflow/blob/main/airflow/hooks/subprocess.py#L76)
 with subprocess hook.  But there was at least a half-decent reason.  We use 
that attribute in on_kill.  We can't return it from `run_command` because the 
method waits sychronously for the command to complete.  We considered exactly 
this question 
[here](https://github.com/apache/airflow/pull/13423#issuecomment-753511787), 
and thought  about returning a runner object  that managed the state of the 
command but figured it was not worth the complexity tradeoff.  But maybe we 
should have done something like that.
   
   But in the snowflake case,  as well as the bigquery case, the query ids are 
not known until after they've completed so there's really no reason to store 
them on the instance.  It's just retrospective metadata at that point.
   
   And looking ahead, if we were to add a `run_async` method to snowflake hook 
(which could be useful with deferrable operators) then we should return the 
query  id, not store it.  Because then you can reuse the hook repeatedly, for 
multiple commands, and the hook doesn't have to make any assumptions about what 
the behavior of the operator willl be.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to