I am also +1 for the idea of secrets backend. And agree with Fokko
that we should support other backends as well.

T.


On Mon, Mar 16, 2020 at 11:19 AM Kaxil Naik <[email protected]> wrote:
>
> Bumping this thread again. We have 2 "+1" votes on it currently (Me &
> Jarek).
>
> Would love to hear other's opinion and thoughts. Personally I feel this is
> a great addition and the one I was looking forward to for a long time.
>
> Regards,
> Kaxil
>
> On Sun, Mar 8, 2020 at 8:33 PM Kaxil Naik <[email protected]> wrote:
>
> > Thanks, Daniel, a *big "+1"* to the AIP and the idea of integrating
> > external vaults and using them as connections. Was in the roadmap of items
> > I wanted to work on so very happy that it is coming together and we are
> > already discussing it.
> >
> > I have also looked at the PR and for the most part, it looks good to me.
> > Following are my comments
> >
> >
> >    1. "*SecretsBackend*" is a more intuitive name in my opinion (we can
> >    have a vote in case we don't come to a conclusion :D)
> >    2. I think we should have *creds_backend_kwargs* (either separate
> >    section or a config allowing JSON-string) or something similar to avoid
> >    hardcoding configs (like *aws_ssm_prefix*/*aws_ssm_profile_name*)
> >    related to a specific provider in *airflow.cfg*.
> >    If we don't do that we would need to hardcode these configs when I (or
> >    someone) adds support for Google's KMS and Hashicorp Vault :)
> >    3. How do we authenticate to the Secrets/Creds Backend?
> >    For Google Cloud KMS or Hashicorp Vault, the main way of
> >    authenticating I think is using Environment Variable.
> >    Of course (2) can solve this, for example, we can pass
> >    *GOOGLE_APPLICATION_CREDENTIAL* JSON file path to *creds_backend_kwargs
> >    *which would then authenticate with GCP
> >    Or do we want to get that value from* Connection object stored in DB*?
> >
> >
> > Regards,
> > Kaxil
> >
> >
> > On Sun, Mar 8, 2020 at 6:41 AM Daniel Standish <[email protected]>
> > wrote:
> >
> >> Thanks Jarek for adding the PR link.  In addition to forgetting to include
> >> that, I put the wrong AIP number in subject -- fixed now (AIP-33 is the
> >> correct AIP).
> >>
> >> I like the name Secrets -- I will let the other ideas percolate and will
> >> look into your suggestion about using the plugin system.
> >>
> >>
> >> On Sat, Mar 7, 2020 at 8:09 PM Jarek Potiuk <[email protected]>
> >> wrote:
> >>
> >> > +1. I like it. And I like the general direction of the name as well not
> >> to
> >> > be tied with connections.
> >> >
> >> > I think it's also worth mentioning that there is PR in progress about it
> >> > here: https://github.com/apache/airflow/pull/6376/files
> >> >
> >> > I have one observation though. We might not necessarily go in this
> >> > direction or maybe defer it to later discussion, but I thought about
> >> > extending the use of such CredentialsBackend and make it simply a
> >> > SecretsBackend (that would be a nice solution to discussing various
> >> names
> >> > for it :)).  Daniel - you might consider that as a "loose" comment.
> >> >
> >> > I do not want to hijack your topic here, so feel free to cut the
> >> discussion
> >> > here if you think it's going too far from the original proposal.
> >> >
> >> > Why don't we change it to a bit more generic "SecretsBackend" and
> >> provide a
> >> > more generic interface to use by the Hooks/Operators? I think the
> >> current
> >> > implementation is not very far from it. Additionally to just providing a
> >> > "get_connection" method it could provide much more generic "get_secret"
> >> and
> >> > this would give the Hooks/Operators an opportunity to use it to retrieve
> >> > all various kinds of secrets, not only Connections. We already have a
> >> > half-baked solution for that with the _COMMAND variables and maybe we
> >> can
> >> > incorporate this in backwards-compatible way in the "SecretBackend'
> >> > similarly as you did with Metastore and Env Backends.
> >> >
> >> > I think this would be a very handy feature - some of the secrets are not
> >> > necessary in the URI form, they might be much more complex and contain
> >> > better structured data and might be needed or available at different
> >> times
> >> > than when connection is retrieved. Often secrets are short-living and
> >> they
> >> > should be retrieved "just" before they are used - limiting it only to
> >> when
> >> > the Hook is created and only to URI is quite a limitation I think.
> >> >
> >> > Again - feel free to disregard that comment, I just thought it might be
> >> a
> >> > handy feature and maybe it's a good time to add it.
> >> >
> >> > J.
> >> >
> >> >
> >> > On Sun, Mar 8, 2020 at 1:03 AM Daniel Standish <[email protected]>
> >> > wrote:
> >> >
> >> > > We can currently retrieve connections from environment variables or
> >> the
> >> > > metastore database.
> >> > >
> >> > > AIP-33
> >> > > <
> >> >
> >> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-33+Creds+backend>
> >> > > provides
> >> > > a way to retrieve them from other sources, for example AWS SSM
> >> parameter
> >> > > store.
> >> > >
> >> > > There are many instances in airflow where we allow for user
> >> customization
> >> > > like this, for example with auth backend and hostname callable -- it
> >> just
> >> > > hasn't been done yet with creds.
> >> > >
> >> > > *How is it implemented?*
> >> > > This adds a base class BaseCredsBackend which takes over BaseHook's
> >> > > get_connections method.  Then EnvironmentVariablesCredsBackend and
> >> > > MetastoreCredsBackend are added as subclasses.  New implementations
> >> can
> >> > be
> >> > > added and user can configure precedence.  E.g. instead of the default
> >> env
> >> > > vars > metastore, a user could specify SSM > metastore > env vars in
> >> > > airflow.cfg.
> >> > >
> >> > > *Does this break anything?*
> >> > > No. This is a relatively simple refactor that results in no change in
> >> the
> >> > > default behavior unless user modifies config to enable an alternative
> >> > > backend.
> >> > >
> >> > > *Why is this worth adding?*
> >> > > If implemented, you could store creds anywhere, as long as you can
> >> write
> >> > a
> >> > > `get_connections` method that retrieves them.
> >> > > This can make it possible for a team of devs to share one creds source
> >> > > rather than each dev storing them in text files, for example.
> >> > > Can also make it easier to spin up a dev instance since you don't
> >> have to
> >> > > worry about loading creds.
> >> > > And if you don't have access to the airflow CLI (because you are on a
> >> > cloud
> >> > > platform perhaps) then this can provide an easy way to load and manage
> >> > > creds.
> >> > >
> >> > > *Why do you call it creds?*
> >> > > I don't need to call it creds.  I could call it BaseConnectionBackend
> >> or
> >> > > BaseConnectionInfoBackend or any other thing -- I don't care too much.
> >> > > There was something about calling it connection backend that was
> >> > > unsatisfying.  Perhaps because connection is a bit of an ambiguous
> >> word.
> >> > > So even though Connection is the name of the model that we instantiate
> >> > when
> >> > > retrieving these creds, creds seemed more specific and more
> >> > > representative.  But again if there is consensus on something else, I
> >> am
> >> > > happy to change.
> >> > > This adds a `creds` package under `airflow`.  Alternatively would we
> >> > create
> >> > > a `connections` package?  Would this cause confusion relative to the
> >> > > *model*
> >> > > Connection?
> >> > > Another thought: even though today this backend will produce a
> >> connection
> >> > > object, maybe in the future it can produce a different model -- either
> >> > way,
> >> > > this backend is still about retrieving *creds*.
> >> > >
> >> > > *Outstanding questions*
> >> > > As mentioned above, some don't like using the word Creds because this
> >> is
> >> > > about producing connections.  I totally get this.  If there is general
> >> > > consensus around another name, I am happy to change it.
> >> > > I welcome any other suggestions or feedback on the structure of this
> >> > > implementation.
> >> > >
> >> >
> >> >
> >> > --
> >> >
> >> > Jarek Potiuk
> >> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >> >
> >> > M: +48 660 796 129 <+48660796129>
> >> > [image: Polidea] <https://www.polidea.com/>
> >> >
> >>
> >



-- 

Tomasz Urbaszek
Polidea | Software Engineer

M: +48 505 628 493
E: [email protected]

Unique Tech
Check out our projects!

Reply via email to