I will mark this AIP
<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-33+Secrets+backend>
as accepted, given we now have 4 +1 binding votes and already 8 days have
passed since this AIP was created.

+1 bindings:

   - Kaxil
   - Jarek
   - Tomek
   - Fokko

I will now create a VOTE thread on the exiting PR.

Regards,
Kaxil


On Mon, Mar 16, 2020 at 10:26 AM Tomasz Urbaszek <
[email protected]> wrote:

> I am also +1 for the idea of secrets backend. And agree with Fokko
> that we should support other backends as well.
>
> T.
>
>
> On Mon, Mar 16, 2020 at 11:19 AM Kaxil Naik <[email protected]> wrote:
> >
> > Bumping this thread again. We have 2 "+1" votes on it currently (Me &
> > Jarek).
> >
> > Would love to hear other's opinion and thoughts. Personally I feel this
> is
> > a great addition and the one I was looking forward to for a long time.
> >
> > Regards,
> > Kaxil
> >
> > On Sun, Mar 8, 2020 at 8:33 PM Kaxil Naik <[email protected]> wrote:
> >
> > > Thanks, Daniel, a *big "+1"* to the AIP and the idea of integrating
> > > external vaults and using them as connections. Was in the roadmap of
> items
> > > I wanted to work on so very happy that it is coming together and we are
> > > already discussing it.
> > >
> > > I have also looked at the PR and for the most part, it looks good to
> me.
> > > Following are my comments
> > >
> > >
> > >    1. "*SecretsBackend*" is a more intuitive name in my opinion (we can
> > >    have a vote in case we don't come to a conclusion :D)
> > >    2. I think we should have *creds_backend_kwargs* (either separate
> > >    section or a config allowing JSON-string) or something similar to
> avoid
> > >    hardcoding configs (like *aws_ssm_prefix*/*aws_ssm_profile_name*)
> > >    related to a specific provider in *airflow.cfg*.
> > >    If we don't do that we would need to hardcode these configs when I
> (or
> > >    someone) adds support for Google's KMS and Hashicorp Vault :)
> > >    3. How do we authenticate to the Secrets/Creds Backend?
> > >    For Google Cloud KMS or Hashicorp Vault, the main way of
> > >    authenticating I think is using Environment Variable.
> > >    Of course (2) can solve this, for example, we can pass
> > >    *GOOGLE_APPLICATION_CREDENTIAL* JSON file path to
> *creds_backend_kwargs
> > >    *which would then authenticate with GCP
> > >    Or do we want to get that value from* Connection object stored in
> DB*?
> > >
> > >
> > > Regards,
> > > Kaxil
> > >
> > >
> > > On Sun, Mar 8, 2020 at 6:41 AM Daniel Standish <[email protected]>
> > > wrote:
> > >
> > >> Thanks Jarek for adding the PR link.  In addition to forgetting to
> include
> > >> that, I put the wrong AIP number in subject -- fixed now (AIP-33 is
> the
> > >> correct AIP).
> > >>
> > >> I like the name Secrets -- I will let the other ideas percolate and
> will
> > >> look into your suggestion about using the plugin system.
> > >>
> > >>
> > >> On Sat, Mar 7, 2020 at 8:09 PM Jarek Potiuk <[email protected]
> >
> > >> wrote:
> > >>
> > >> > +1. I like it. And I like the general direction of the name as well
> not
> > >> to
> > >> > be tied with connections.
> > >> >
> > >> > I think it's also worth mentioning that there is PR in progress
> about it
> > >> > here: https://github.com/apache/airflow/pull/6376/files
> > >> >
> > >> > I have one observation though. We might not necessarily go in this
> > >> > direction or maybe defer it to later discussion, but I thought about
> > >> > extending the use of such CredentialsBackend and make it simply a
> > >> > SecretsBackend (that would be a nice solution to discussing various
> > >> names
> > >> > for it :)).  Daniel - you might consider that as a "loose" comment.
> > >> >
> > >> > I do not want to hijack your topic here, so feel free to cut the
> > >> discussion
> > >> > here if you think it's going too far from the original proposal.
> > >> >
> > >> > Why don't we change it to a bit more generic "SecretsBackend" and
> > >> provide a
> > >> > more generic interface to use by the Hooks/Operators? I think the
> > >> current
> > >> > implementation is not very far from it. Additionally to just
> providing a
> > >> > "get_connection" method it could provide much more generic
> "get_secret"
> > >> and
> > >> > this would give the Hooks/Operators an opportunity to use it to
> retrieve
> > >> > all various kinds of secrets, not only Connections. We already have
> a
> > >> > half-baked solution for that with the _COMMAND variables and maybe
> we
> > >> can
> > >> > incorporate this in backwards-compatible way in the "SecretBackend'
> > >> > similarly as you did with Metastore and Env Backends.
> > >> >
> > >> > I think this would be a very handy feature - some of the secrets
> are not
> > >> > necessary in the URI form, they might be much more complex and
> contain
> > >> > better structured data and might be needed or available at different
> > >> times
> > >> > than when connection is retrieved. Often secrets are short-living
> and
> > >> they
> > >> > should be retrieved "just" before they are used - limiting it only
> to
> > >> when
> > >> > the Hook is created and only to URI is quite a limitation I think.
> > >> >
> > >> > Again - feel free to disregard that comment, I just thought it
> might be
> > >> a
> > >> > handy feature and maybe it's a good time to add it.
> > >> >
> > >> > J.
> > >> >
> > >> >
> > >> > On Sun, Mar 8, 2020 at 1:03 AM Daniel Standish <
> [email protected]>
> > >> > wrote:
> > >> >
> > >> > > We can currently retrieve connections from environment variables
> or
> > >> the
> > >> > > metastore database.
> > >> > >
> > >> > > AIP-33
> > >> > > <
> > >> >
> > >>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-33+Creds+backend>
> > >> > > provides
> > >> > > a way to retrieve them from other sources, for example AWS SSM
> > >> parameter
> > >> > > store.
> > >> > >
> > >> > > There are many instances in airflow where we allow for user
> > >> customization
> > >> > > like this, for example with auth backend and hostname callable --
> it
> > >> just
> > >> > > hasn't been done yet with creds.
> > >> > >
> > >> > > *How is it implemented?*
> > >> > > This adds a base class BaseCredsBackend which takes over
> BaseHook's
> > >> > > get_connections method.  Then EnvironmentVariablesCredsBackend and
> > >> > > MetastoreCredsBackend are added as subclasses.  New
> implementations
> > >> can
> > >> > be
> > >> > > added and user can configure precedence.  E.g. instead of the
> default
> > >> env
> > >> > > vars > metastore, a user could specify SSM > metastore > env vars
> in
> > >> > > airflow.cfg.
> > >> > >
> > >> > > *Does this break anything?*
> > >> > > No. This is a relatively simple refactor that results in no
> change in
> > >> the
> > >> > > default behavior unless user modifies config to enable an
> alternative
> > >> > > backend.
> > >> > >
> > >> > > *Why is this worth adding?*
> > >> > > If implemented, you could store creds anywhere, as long as you can
> > >> write
> > >> > a
> > >> > > `get_connections` method that retrieves them.
> > >> > > This can make it possible for a team of devs to share one creds
> source
> > >> > > rather than each dev storing them in text files, for example.
> > >> > > Can also make it easier to spin up a dev instance since you don't
> > >> have to
> > >> > > worry about loading creds.
> > >> > > And if you don't have access to the airflow CLI (because you are
> on a
> > >> > cloud
> > >> > > platform perhaps) then this can provide an easy way to load and
> manage
> > >> > > creds.
> > >> > >
> > >> > > *Why do you call it creds?*
> > >> > > I don't need to call it creds.  I could call it
> BaseConnectionBackend
> > >> or
> > >> > > BaseConnectionInfoBackend or any other thing -- I don't care too
> much.
> > >> > > There was something about calling it connection backend that was
> > >> > > unsatisfying.  Perhaps because connection is a bit of an ambiguous
> > >> word.
> > >> > > So even though Connection is the name of the model that we
> instantiate
> > >> > when
> > >> > > retrieving these creds, creds seemed more specific and more
> > >> > > representative.  But again if there is consensus on something
> else, I
> > >> am
> > >> > > happy to change.
> > >> > > This adds a `creds` package under `airflow`.  Alternatively would
> we
> > >> > create
> > >> > > a `connections` package?  Would this cause confusion relative to
> the
> > >> > > *model*
> > >> > > Connection?
> > >> > > Another thought: even though today this backend will produce a
> > >> connection
> > >> > > object, maybe in the future it can produce a different model --
> either
> > >> > way,
> > >> > > this backend is still about retrieving *creds*.
> > >> > >
> > >> > > *Outstanding questions*
> > >> > > As mentioned above, some don't like using the word Creds because
> this
> > >> is
> > >> > > about producing connections.  I totally get this.  If there is
> general
> > >> > > consensus around another name, I am happy to change it.
> > >> > > I welcome any other suggestions or feedback on the structure of
> this
> > >> > > implementation.
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> >
> > >> > Jarek Potiuk
> > >> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >> >
> > >> > M: +48 660 796 129 <+48660796129>
> > >> > [image: Polidea] <https://www.polidea.com/>
> > >> >
> > >>
> > >
>
>
>
> --
>
> Tomasz Urbaszek
> Polidea | Software Engineer
>
> M: +48 505 628 493
> E: [email protected]
>
> Unique Tech
> Check out our projects!
>

Reply via email to