potiuk edited a comment on pull request #8974:
URL: https://github.com/apache/airflow/pull/8974#issuecomment-644710427
> One for the VaultHook, and one for the extra auth types -- if a user is
wanting more Vault auth types and is looking through the changelog they
couldn't know it's been added.
That's fine - I can split it, but I would like to have clarity on the
architecture so I do not have to again refactor all the tests.
@ashb -> Can you please take a look at the way it is implemented now? It is
also super important to figure out if we are all OK with this approach as I
already started doing that for other Secret backends.
For now, I moved the client to `hooks/vault.py` but I still think it makes
sense to keep it as a separate entity without dependency on whether we parse
authentication information from the connection or from the Backend dictionary
stored in the configuration.
Let me explain my thinking behind the architecture and maybe you will see
why I have done it this way.
Specifically, - those are taken from connection always (and you should not
pass them IMHO via hook initialization parameters). Those are the "basic"
parameters that Connection provides always (and main reason why Connection is
encrypted with Fernet Key - to keep them secret):
```
username=self.connection.login,
password=self.connection.password,
key_id=self.connection.login,
secret_id=self.connection.password,
```
I believe this is what the `Hook` meaning is in Airflow - it should take the
connection and read all the "basic" information from the connection. Then maybe
reads some extras (also from connection) - those being overridable via
initialization parameters.
This is is how I see a Hook "pattern" in Airflow. Hooks can use other
libraries to make the calls. I see the "client" as internal library that hides
the complexity of the actual authentication method to use based on parameters
passed.
I see a few options of using Backend <> Hook in this scenario:
1) we build an artificial connection and pass it to hook (very bad idea
IMHO)
2) Get Secret to use Hook directly. For that we will have to change the Hook
to accept the basic parameters (host/port etc.) not only from connection but
also if there is a lack of connection it should have additional parameters for
host, port, user, passwords (or maybe just url) from the parameters passed. I
believe this is what you propose @ashb ? Correct?
3) Implement intermediary used by both Hook and Secret - (Client in my case)
which is library used by both - Hook and Backend. Client has no relation to
Connection - it's Hook's job to provide all the parameters to the Client parsed
from the connection.
For 2), I think exposing password/URL/login via Hook constructor is a bad
idea. The hook should encourage connection as the only source of the
connectivity information - because it is encrypted and shared with other
workers by design. Exposing host/url/ etc. and allowing connection to be None
is a bad idea for Hook IMHO. When you look at BaseHook, the only "instance"
method in this is what you see -> clearly you expect some relation with the
Connection object:
```
@classmethod
def get_connections(cls, conn_id: str) -> List[Connection]:
@classmethod
def get_connection(cls, conn_id: str) -> Connection:
@classmethod
def get_hook(cls, conn_id: str) -> "BaseHook":
def get_conn(self):
```
I'd love to understand what's really wrong with approach 2) in this context.
Is the problem with documentation duplication? I think the separation of
concerns is fairly solid here.
One thing that I might consider to remove some of the documentation
duplication (which is clearly there) to make a Named Tuple to keep all the
Vault parameters and use it in Client and Hook.
Please let me know what you think @kaxil - I'd love your opinion as well
before I move forward with it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]