Yes, big +1 from my side as well. I think this is a requirement for any bigger company.
1. I think we should add support for KMS, Vault etc. 2. Make the current backend pluggable, and set the current as default (Airflow metastore) Cheers, Fokko Op ma 16 mrt. 2020 om 11:19 schreef Kaxil Naik <[email protected]>: > Bumping this thread again. We have 2 "+1" votes on it currently (Me & > Jarek). > > Would love to hear other's opinion and thoughts. Personally I feel this is > a great addition and the one I was looking forward to for a long time. > > Regards, > Kaxil > > On Sun, Mar 8, 2020 at 8:33 PM Kaxil Naik <[email protected]> wrote: > > > Thanks, Daniel, a *big "+1"* to the AIP and the idea of integrating > > external vaults and using them as connections. Was in the roadmap of > items > > I wanted to work on so very happy that it is coming together and we are > > already discussing it. > > > > I have also looked at the PR and for the most part, it looks good to me. > > Following are my comments > > > > > > 1. "*SecretsBackend*" is a more intuitive name in my opinion (we can > > have a vote in case we don't come to a conclusion :D) > > 2. I think we should have *creds_backend_kwargs* (either separate > > section or a config allowing JSON-string) or something similar to > avoid > > hardcoding configs (like *aws_ssm_prefix*/*aws_ssm_profile_name*) > > related to a specific provider in *airflow.cfg*. > > If we don't do that we would need to hardcode these configs when I (or > > someone) adds support for Google's KMS and Hashicorp Vault :) > > 3. How do we authenticate to the Secrets/Creds Backend? > > For Google Cloud KMS or Hashicorp Vault, the main way of > > authenticating I think is using Environment Variable. > > Of course (2) can solve this, for example, we can pass > > *GOOGLE_APPLICATION_CREDENTIAL* JSON file path to > *creds_backend_kwargs > > *which would then authenticate with GCP > > Or do we want to get that value from* Connection object stored in DB*? > > > > > > Regards, > > Kaxil > > > > > > On Sun, Mar 8, 2020 at 6:41 AM Daniel Standish <[email protected]> > > wrote: > > > >> Thanks Jarek for adding the PR link. In addition to forgetting to > include > >> that, I put the wrong AIP number in subject -- fixed now (AIP-33 is the > >> correct AIP). > >> > >> I like the name Secrets -- I will let the other ideas percolate and will > >> look into your suggestion about using the plugin system. > >> > >> > >> On Sat, Mar 7, 2020 at 8:09 PM Jarek Potiuk <[email protected]> > >> wrote: > >> > >> > +1. I like it. And I like the general direction of the name as well > not > >> to > >> > be tied with connections. > >> > > >> > I think it's also worth mentioning that there is PR in progress about > it > >> > here: https://github.com/apache/airflow/pull/6376/files > >> > > >> > I have one observation though. We might not necessarily go in this > >> > direction or maybe defer it to later discussion, but I thought about > >> > extending the use of such CredentialsBackend and make it simply a > >> > SecretsBackend (that would be a nice solution to discussing various > >> names > >> > for it :)). Daniel - you might consider that as a "loose" comment. > >> > > >> > I do not want to hijack your topic here, so feel free to cut the > >> discussion > >> > here if you think it's going too far from the original proposal. > >> > > >> > Why don't we change it to a bit more generic "SecretsBackend" and > >> provide a > >> > more generic interface to use by the Hooks/Operators? I think the > >> current > >> > implementation is not very far from it. Additionally to just > providing a > >> > "get_connection" method it could provide much more generic > "get_secret" > >> and > >> > this would give the Hooks/Operators an opportunity to use it to > retrieve > >> > all various kinds of secrets, not only Connections. We already have a > >> > half-baked solution for that with the _COMMAND variables and maybe we > >> can > >> > incorporate this in backwards-compatible way in the "SecretBackend' > >> > similarly as you did with Metastore and Env Backends. > >> > > >> > I think this would be a very handy feature - some of the secrets are > not > >> > necessary in the URI form, they might be much more complex and contain > >> > better structured data and might be needed or available at different > >> times > >> > than when connection is retrieved. Often secrets are short-living and > >> they > >> > should be retrieved "just" before they are used - limiting it only to > >> when > >> > the Hook is created and only to URI is quite a limitation I think. > >> > > >> > Again - feel free to disregard that comment, I just thought it might > be > >> a > >> > handy feature and maybe it's a good time to add it. > >> > > >> > J. > >> > > >> > > >> > On Sun, Mar 8, 2020 at 1:03 AM Daniel Standish <[email protected]> > >> > wrote: > >> > > >> > > We can currently retrieve connections from environment variables or > >> the > >> > > metastore database. > >> > > > >> > > AIP-33 > >> > > < > >> > > >> > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-33+Creds+backend> > >> > > provides > >> > > a way to retrieve them from other sources, for example AWS SSM > >> parameter > >> > > store. > >> > > > >> > > There are many instances in airflow where we allow for user > >> customization > >> > > like this, for example with auth backend and hostname callable -- it > >> just > >> > > hasn't been done yet with creds. > >> > > > >> > > *How is it implemented?* > >> > > This adds a base class BaseCredsBackend which takes over BaseHook's > >> > > get_connections method. Then EnvironmentVariablesCredsBackend and > >> > > MetastoreCredsBackend are added as subclasses. New implementations > >> can > >> > be > >> > > added and user can configure precedence. E.g. instead of the > default > >> env > >> > > vars > metastore, a user could specify SSM > metastore > env vars in > >> > > airflow.cfg. > >> > > > >> > > *Does this break anything?* > >> > > No. This is a relatively simple refactor that results in no change > in > >> the > >> > > default behavior unless user modifies config to enable an > alternative > >> > > backend. > >> > > > >> > > *Why is this worth adding?* > >> > > If implemented, you could store creds anywhere, as long as you can > >> write > >> > a > >> > > `get_connections` method that retrieves them. > >> > > This can make it possible for a team of devs to share one creds > source > >> > > rather than each dev storing them in text files, for example. > >> > > Can also make it easier to spin up a dev instance since you don't > >> have to > >> > > worry about loading creds. > >> > > And if you don't have access to the airflow CLI (because you are on > a > >> > cloud > >> > > platform perhaps) then this can provide an easy way to load and > manage > >> > > creds. > >> > > > >> > > *Why do you call it creds?* > >> > > I don't need to call it creds. I could call it > BaseConnectionBackend > >> or > >> > > BaseConnectionInfoBackend or any other thing -- I don't care too > much. > >> > > There was something about calling it connection backend that was > >> > > unsatisfying. Perhaps because connection is a bit of an ambiguous > >> word. > >> > > So even though Connection is the name of the model that we > instantiate > >> > when > >> > > retrieving these creds, creds seemed more specific and more > >> > > representative. But again if there is consensus on something else, > I > >> am > >> > > happy to change. > >> > > This adds a `creds` package under `airflow`. Alternatively would we > >> > create > >> > > a `connections` package? Would this cause confusion relative to the > >> > > *model* > >> > > Connection? > >> > > Another thought: even though today this backend will produce a > >> connection > >> > > object, maybe in the future it can produce a different model -- > either > >> > way, > >> > > this backend is still about retrieving *creds*. > >> > > > >> > > *Outstanding questions* > >> > > As mentioned above, some don't like using the word Creds because > this > >> is > >> > > about producing connections. I totally get this. If there is > general > >> > > consensus around another name, I am happy to change it. > >> > > I welcome any other suggestions or feedback on the structure of this > >> > > implementation. > >> > > > >> > > >> > > >> > -- > >> > > >> > Jarek Potiuk > >> > Polidea <https://www.polidea.com/> | Principal Software Engineer > >> > > >> > M: +48 660 796 129 <+48660796129> > >> > [image: Polidea] <https://www.polidea.com/> > >> > > >> > > >
