I will mark this AIP <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-33+Secrets+backend> as accepted, given we now have 4 +1 binding votes and already 8 days have passed since this AIP was created.
+1 bindings: - Kaxil - Jarek - Tomek - Fokko I will now create a VOTE thread on the exiting PR. Regards, Kaxil On Mon, Mar 16, 2020 at 10:26 AM Tomasz Urbaszek < [email protected]> wrote: > I am also +1 for the idea of secrets backend. And agree with Fokko > that we should support other backends as well. > > T. > > > On Mon, Mar 16, 2020 at 11:19 AM Kaxil Naik <[email protected]> wrote: > > > > Bumping this thread again. We have 2 "+1" votes on it currently (Me & > > Jarek). > > > > Would love to hear other's opinion and thoughts. Personally I feel this > is > > a great addition and the one I was looking forward to for a long time. > > > > Regards, > > Kaxil > > > > On Sun, Mar 8, 2020 at 8:33 PM Kaxil Naik <[email protected]> wrote: > > > > > Thanks, Daniel, a *big "+1"* to the AIP and the idea of integrating > > > external vaults and using them as connections. Was in the roadmap of > items > > > I wanted to work on so very happy that it is coming together and we are > > > already discussing it. > > > > > > I have also looked at the PR and for the most part, it looks good to > me. > > > Following are my comments > > > > > > > > > 1. "*SecretsBackend*" is a more intuitive name in my opinion (we can > > > have a vote in case we don't come to a conclusion :D) > > > 2. I think we should have *creds_backend_kwargs* (either separate > > > section or a config allowing JSON-string) or something similar to > avoid > > > hardcoding configs (like *aws_ssm_prefix*/*aws_ssm_profile_name*) > > > related to a specific provider in *airflow.cfg*. > > > If we don't do that we would need to hardcode these configs when I > (or > > > someone) adds support for Google's KMS and Hashicorp Vault :) > > > 3. How do we authenticate to the Secrets/Creds Backend? > > > For Google Cloud KMS or Hashicorp Vault, the main way of > > > authenticating I think is using Environment Variable. > > > Of course (2) can solve this, for example, we can pass > > > *GOOGLE_APPLICATION_CREDENTIAL* JSON file path to > *creds_backend_kwargs > > > *which would then authenticate with GCP > > > Or do we want to get that value from* Connection object stored in > DB*? > > > > > > > > > Regards, > > > Kaxil > > > > > > > > > On Sun, Mar 8, 2020 at 6:41 AM Daniel Standish <[email protected]> > > > wrote: > > > > > >> Thanks Jarek for adding the PR link. In addition to forgetting to > include > > >> that, I put the wrong AIP number in subject -- fixed now (AIP-33 is > the > > >> correct AIP). > > >> > > >> I like the name Secrets -- I will let the other ideas percolate and > will > > >> look into your suggestion about using the plugin system. > > >> > > >> > > >> On Sat, Mar 7, 2020 at 8:09 PM Jarek Potiuk <[email protected] > > > > >> wrote: > > >> > > >> > +1. I like it. And I like the general direction of the name as well > not > > >> to > > >> > be tied with connections. > > >> > > > >> > I think it's also worth mentioning that there is PR in progress > about it > > >> > here: https://github.com/apache/airflow/pull/6376/files > > >> > > > >> > I have one observation though. We might not necessarily go in this > > >> > direction or maybe defer it to later discussion, but I thought about > > >> > extending the use of such CredentialsBackend and make it simply a > > >> > SecretsBackend (that would be a nice solution to discussing various > > >> names > > >> > for it :)). Daniel - you might consider that as a "loose" comment. > > >> > > > >> > I do not want to hijack your topic here, so feel free to cut the > > >> discussion > > >> > here if you think it's going too far from the original proposal. > > >> > > > >> > Why don't we change it to a bit more generic "SecretsBackend" and > > >> provide a > > >> > more generic interface to use by the Hooks/Operators? I think the > > >> current > > >> > implementation is not very far from it. Additionally to just > providing a > > >> > "get_connection" method it could provide much more generic > "get_secret" > > >> and > > >> > this would give the Hooks/Operators an opportunity to use it to > retrieve > > >> > all various kinds of secrets, not only Connections. We already have > a > > >> > half-baked solution for that with the _COMMAND variables and maybe > we > > >> can > > >> > incorporate this in backwards-compatible way in the "SecretBackend' > > >> > similarly as you did with Metastore and Env Backends. > > >> > > > >> > I think this would be a very handy feature - some of the secrets > are not > > >> > necessary in the URI form, they might be much more complex and > contain > > >> > better structured data and might be needed or available at different > > >> times > > >> > than when connection is retrieved. Often secrets are short-living > and > > >> they > > >> > should be retrieved "just" before they are used - limiting it only > to > > >> when > > >> > the Hook is created and only to URI is quite a limitation I think. > > >> > > > >> > Again - feel free to disregard that comment, I just thought it > might be > > >> a > > >> > handy feature and maybe it's a good time to add it. > > >> > > > >> > J. > > >> > > > >> > > > >> > On Sun, Mar 8, 2020 at 1:03 AM Daniel Standish < > [email protected]> > > >> > wrote: > > >> > > > >> > > We can currently retrieve connections from environment variables > or > > >> the > > >> > > metastore database. > > >> > > > > >> > > AIP-33 > > >> > > < > > >> > > > >> > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-33+Creds+backend> > > >> > > provides > > >> > > a way to retrieve them from other sources, for example AWS SSM > > >> parameter > > >> > > store. > > >> > > > > >> > > There are many instances in airflow where we allow for user > > >> customization > > >> > > like this, for example with auth backend and hostname callable -- > it > > >> just > > >> > > hasn't been done yet with creds. > > >> > > > > >> > > *How is it implemented?* > > >> > > This adds a base class BaseCredsBackend which takes over > BaseHook's > > >> > > get_connections method. Then EnvironmentVariablesCredsBackend and > > >> > > MetastoreCredsBackend are added as subclasses. New > implementations > > >> can > > >> > be > > >> > > added and user can configure precedence. E.g. instead of the > default > > >> env > > >> > > vars > metastore, a user could specify SSM > metastore > env vars > in > > >> > > airflow.cfg. > > >> > > > > >> > > *Does this break anything?* > > >> > > No. This is a relatively simple refactor that results in no > change in > > >> the > > >> > > default behavior unless user modifies config to enable an > alternative > > >> > > backend. > > >> > > > > >> > > *Why is this worth adding?* > > >> > > If implemented, you could store creds anywhere, as long as you can > > >> write > > >> > a > > >> > > `get_connections` method that retrieves them. > > >> > > This can make it possible for a team of devs to share one creds > source > > >> > > rather than each dev storing them in text files, for example. > > >> > > Can also make it easier to spin up a dev instance since you don't > > >> have to > > >> > > worry about loading creds. > > >> > > And if you don't have access to the airflow CLI (because you are > on a > > >> > cloud > > >> > > platform perhaps) then this can provide an easy way to load and > manage > > >> > > creds. > > >> > > > > >> > > *Why do you call it creds?* > > >> > > I don't need to call it creds. I could call it > BaseConnectionBackend > > >> or > > >> > > BaseConnectionInfoBackend or any other thing -- I don't care too > much. > > >> > > There was something about calling it connection backend that was > > >> > > unsatisfying. Perhaps because connection is a bit of an ambiguous > > >> word. > > >> > > So even though Connection is the name of the model that we > instantiate > > >> > when > > >> > > retrieving these creds, creds seemed more specific and more > > >> > > representative. But again if there is consensus on something > else, I > > >> am > > >> > > happy to change. > > >> > > This adds a `creds` package under `airflow`. Alternatively would > we > > >> > create > > >> > > a `connections` package? Would this cause confusion relative to > the > > >> > > *model* > > >> > > Connection? > > >> > > Another thought: even though today this backend will produce a > > >> connection > > >> > > object, maybe in the future it can produce a different model -- > either > > >> > way, > > >> > > this backend is still about retrieving *creds*. > > >> > > > > >> > > *Outstanding questions* > > >> > > As mentioned above, some don't like using the word Creds because > this > > >> is > > >> > > about producing connections. I totally get this. If there is > general > > >> > > consensus around another name, I am happy to change it. > > >> > > I welcome any other suggestions or feedback on the structure of > this > > >> > > implementation. > > >> > > > > >> > > > >> > > > >> > -- > > >> > > > >> > Jarek Potiuk > > >> > Polidea <https://www.polidea.com/> | Principal Software Engineer > > >> > > > >> > M: +48 660 796 129 <+48660796129> > > >> > [image: Polidea] <https://www.polidea.com/> > > >> > > > >> > > > > > > > -- > > Tomasz Urbaszek > Polidea | Software Engineer > > M: +48 505 628 493 > E: [email protected] > > Unique Tech > Check out our projects! >
