I am also +1 for the idea of secrets backend. And agree with Fokko that we should support other backends as well.
T. On Mon, Mar 16, 2020 at 11:19 AM Kaxil Naik <[email protected]> wrote: > > Bumping this thread again. We have 2 "+1" votes on it currently (Me & > Jarek). > > Would love to hear other's opinion and thoughts. Personally I feel this is > a great addition and the one I was looking forward to for a long time. > > Regards, > Kaxil > > On Sun, Mar 8, 2020 at 8:33 PM Kaxil Naik <[email protected]> wrote: > > > Thanks, Daniel, a *big "+1"* to the AIP and the idea of integrating > > external vaults and using them as connections. Was in the roadmap of items > > I wanted to work on so very happy that it is coming together and we are > > already discussing it. > > > > I have also looked at the PR and for the most part, it looks good to me. > > Following are my comments > > > > > > 1. "*SecretsBackend*" is a more intuitive name in my opinion (we can > > have a vote in case we don't come to a conclusion :D) > > 2. I think we should have *creds_backend_kwargs* (either separate > > section or a config allowing JSON-string) or something similar to avoid > > hardcoding configs (like *aws_ssm_prefix*/*aws_ssm_profile_name*) > > related to a specific provider in *airflow.cfg*. > > If we don't do that we would need to hardcode these configs when I (or > > someone) adds support for Google's KMS and Hashicorp Vault :) > > 3. How do we authenticate to the Secrets/Creds Backend? > > For Google Cloud KMS or Hashicorp Vault, the main way of > > authenticating I think is using Environment Variable. > > Of course (2) can solve this, for example, we can pass > > *GOOGLE_APPLICATION_CREDENTIAL* JSON file path to *creds_backend_kwargs > > *which would then authenticate with GCP > > Or do we want to get that value from* Connection object stored in DB*? > > > > > > Regards, > > Kaxil > > > > > > On Sun, Mar 8, 2020 at 6:41 AM Daniel Standish <[email protected]> > > wrote: > > > >> Thanks Jarek for adding the PR link. In addition to forgetting to include > >> that, I put the wrong AIP number in subject -- fixed now (AIP-33 is the > >> correct AIP). > >> > >> I like the name Secrets -- I will let the other ideas percolate and will > >> look into your suggestion about using the plugin system. > >> > >> > >> On Sat, Mar 7, 2020 at 8:09 PM Jarek Potiuk <[email protected]> > >> wrote: > >> > >> > +1. I like it. And I like the general direction of the name as well not > >> to > >> > be tied with connections. > >> > > >> > I think it's also worth mentioning that there is PR in progress about it > >> > here: https://github.com/apache/airflow/pull/6376/files > >> > > >> > I have one observation though. We might not necessarily go in this > >> > direction or maybe defer it to later discussion, but I thought about > >> > extending the use of such CredentialsBackend and make it simply a > >> > SecretsBackend (that would be a nice solution to discussing various > >> names > >> > for it :)). Daniel - you might consider that as a "loose" comment. > >> > > >> > I do not want to hijack your topic here, so feel free to cut the > >> discussion > >> > here if you think it's going too far from the original proposal. > >> > > >> > Why don't we change it to a bit more generic "SecretsBackend" and > >> provide a > >> > more generic interface to use by the Hooks/Operators? I think the > >> current > >> > implementation is not very far from it. Additionally to just providing a > >> > "get_connection" method it could provide much more generic "get_secret" > >> and > >> > this would give the Hooks/Operators an opportunity to use it to retrieve > >> > all various kinds of secrets, not only Connections. We already have a > >> > half-baked solution for that with the _COMMAND variables and maybe we > >> can > >> > incorporate this in backwards-compatible way in the "SecretBackend' > >> > similarly as you did with Metastore and Env Backends. > >> > > >> > I think this would be a very handy feature - some of the secrets are not > >> > necessary in the URI form, they might be much more complex and contain > >> > better structured data and might be needed or available at different > >> times > >> > than when connection is retrieved. Often secrets are short-living and > >> they > >> > should be retrieved "just" before they are used - limiting it only to > >> when > >> > the Hook is created and only to URI is quite a limitation I think. > >> > > >> > Again - feel free to disregard that comment, I just thought it might be > >> a > >> > handy feature and maybe it's a good time to add it. > >> > > >> > J. > >> > > >> > > >> > On Sun, Mar 8, 2020 at 1:03 AM Daniel Standish <[email protected]> > >> > wrote: > >> > > >> > > We can currently retrieve connections from environment variables or > >> the > >> > > metastore database. > >> > > > >> > > AIP-33 > >> > > < > >> > > >> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-33+Creds+backend> > >> > > provides > >> > > a way to retrieve them from other sources, for example AWS SSM > >> parameter > >> > > store. > >> > > > >> > > There are many instances in airflow where we allow for user > >> customization > >> > > like this, for example with auth backend and hostname callable -- it > >> just > >> > > hasn't been done yet with creds. > >> > > > >> > > *How is it implemented?* > >> > > This adds a base class BaseCredsBackend which takes over BaseHook's > >> > > get_connections method. Then EnvironmentVariablesCredsBackend and > >> > > MetastoreCredsBackend are added as subclasses. New implementations > >> can > >> > be > >> > > added and user can configure precedence. E.g. instead of the default > >> env > >> > > vars > metastore, a user could specify SSM > metastore > env vars in > >> > > airflow.cfg. > >> > > > >> > > *Does this break anything?* > >> > > No. This is a relatively simple refactor that results in no change in > >> the > >> > > default behavior unless user modifies config to enable an alternative > >> > > backend. > >> > > > >> > > *Why is this worth adding?* > >> > > If implemented, you could store creds anywhere, as long as you can > >> write > >> > a > >> > > `get_connections` method that retrieves them. > >> > > This can make it possible for a team of devs to share one creds source > >> > > rather than each dev storing them in text files, for example. > >> > > Can also make it easier to spin up a dev instance since you don't > >> have to > >> > > worry about loading creds. > >> > > And if you don't have access to the airflow CLI (because you are on a > >> > cloud > >> > > platform perhaps) then this can provide an easy way to load and manage > >> > > creds. > >> > > > >> > > *Why do you call it creds?* > >> > > I don't need to call it creds. I could call it BaseConnectionBackend > >> or > >> > > BaseConnectionInfoBackend or any other thing -- I don't care too much. > >> > > There was something about calling it connection backend that was > >> > > unsatisfying. Perhaps because connection is a bit of an ambiguous > >> word. > >> > > So even though Connection is the name of the model that we instantiate > >> > when > >> > > retrieving these creds, creds seemed more specific and more > >> > > representative. But again if there is consensus on something else, I > >> am > >> > > happy to change. > >> > > This adds a `creds` package under `airflow`. Alternatively would we > >> > create > >> > > a `connections` package? Would this cause confusion relative to the > >> > > *model* > >> > > Connection? > >> > > Another thought: even though today this backend will produce a > >> connection > >> > > object, maybe in the future it can produce a different model -- either > >> > way, > >> > > this backend is still about retrieving *creds*. > >> > > > >> > > *Outstanding questions* > >> > > As mentioned above, some don't like using the word Creds because this > >> is > >> > > about producing connections. I totally get this. If there is general > >> > > consensus around another name, I am happy to change it. > >> > > I welcome any other suggestions or feedback on the structure of this > >> > > implementation. > >> > > > >> > > >> > > >> > -- > >> > > >> > Jarek Potiuk > >> > Polidea <https://www.polidea.com/> | Principal Software Engineer > >> > > >> > M: +48 660 796 129 <+48660796129> > >> > [image: Polidea] <https://www.polidea.com/> > >> > > >> > > -- Tomasz Urbaszek Polidea | Software Engineer M: +48 505 628 493 E: [email protected] Unique Tech Check out our projects!
