potiuk commented on pull request #8974:
URL: https://github.com/apache/airflow/pull/8974#issuecomment-644710427


   > One for the VaultHook, and one for the extra auth types -- if a user is 
wanting more Vault auth types and is looking through the changelog they 
couldn't know it's been added.
   
   That's fine - I can split it, but I would like to have clarity on the 
architecture so I do not have to again refactor all the tests. 
   
   @ashb -> Can you please take a look at the way it is implemented now? It is 
also super important to figure out if we are all OK with this approach as I 
already started doing that for other Secret backends.
   
   For now, I moved the client to hook.py but I still think it makes sense to 
keep it as a separate entity without dependency on whether we parse 
authentication information from the connection or from the Backend dictionary 
stored in the configuration.
   
   Let me explain my thinking behind the architecture and maybe you will see 
why I have done it this way.
   
   Specifically, - those are taken from connection always (and you should not 
pass them IMHO via hook initialization parameters). Those are the "basic" 
parameters that Connection provides always (and main reason why Connection is 
encrypted with Fernet Key - to keep them secret):
   
   ```
               username=self.connection.login,
               password=self.connection.password,
               key_id=self.connection.login,
               secret_id=self.connection.password,
   ```
   
   I believe this is what the `Hook` meaning is in Airflow - it should take the 
connection and read all the "basic" information from the connection. Then maybe 
reads some extras (also from connection) - those being overridable via 
initialization parameters. 
   
   This is is how I see a Hook "pattern" in Airflow. Hooks can use other 
libraries to make the calls. I see the "client" as internal library that hides 
the complexity of the actual authentication method to use based on parameters 
passed.
   
   I see a few options of using Backend <> Hook in this scenario:
   
   1)  we build an artificial connection and pass it to hook (very bad idea 
IMHO) 
   
   2) Get Secret to use Hook directly. For that we will have to change the Hook 
to accept the basic parameters (host/port etc.) not only from connection but 
also if there is a lack of connection it should have additional parameters for 
host, port, user, passwords (or maybe just url)  from the parameters passed. I 
believe this is what you propose @ashb ? Correct?
   
   3) Implement intermediary used by both Hook and Secret - (Client in my case) 
which is library used by both - Hook and Backend. Client has no relation to 
Connection - it's Hook's job to provide all the parameters to the Client parsed 
from the connection.
   
   
   For 2), I think exposing password/URL/login via Hook constructor is a bad 
idea. The hook should encourage connection as the only source of the 
connectivity information - because it is encrypted and shared with other 
workers by design. Exposing host/url/ etc. and allowing connection to be None 
is a bad idea for Hook IMHO. When you look at BaseHook, the only "instance" 
method in this is what you see -> clearly you expect some relation with the 
Connection object:
   
   ```
       @classmethod
       def get_connections(cls, conn_id: str) -> List[Connection]:
   
       @classmethod
       def get_connection(cls, conn_id: str) -> Connection:
   
       @classmethod
       def get_hook(cls, conn_id: str) -> "BaseHook":
   
       def get_conn(self):
   ```
   
   I'd love to understand what's really wrong with approach 2) in this context.
   
   Is the problem with documentation duplication?  I think the separation of 
concerns is fairly solid here. 
    One thing that I might consider to remove some of the documentation 
duplication (which is clearly there) to make a Named Tuple to keep all the 
Vault parameters and use it in Client and Hook.
   
   Please let me know what you think @kaxil - I'd love your opinion as well 
before I move forward with it.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to