Yep. That is exactly why I think we need the hooks. Is it possible you
donate your code for the Vault Hook implementation ?

I would love to use it for my implementation. (And make you or whoever the
author is as co-author :)

J.


wt., 19 maj 2020, 09:41 użytkownik Nathan Hadfield <nathan.hadfi...@king.com>
napisał:

> Jarek,
>
> We are already using the secret backend for Airflow variables.  But,
> because of the example I explained and also a programmatic need to update
> our GCP Airflow connections every day, then we still have to maintain a
> secondary, custom method for Vault authentication and manipulation of other
> secrets.
>
> Cheers,
>
> Nathan
>
> On 18/05/2020, 20:07, "Jarek Potiuk" <jarek.pot...@polidea.com> wrote:
>
>     Thanks Nathan,
>
>     I think your case is really good example where the Hook might be really
>     useful (and apparently somebody did it already via Hooks).
>
>     I wonder Nathan if you (in the future) switch to secret backend -
> would you
>     use the same secret backend for Airlfow connections/variables? Or do
> you
>     foresee that you will have another backend/credentials to access it?
>
>     Maybe others had similar experiences - and would like to share it here?
>
>     I still think there is a valid point in having separate hooks. Those
> are my
>     points:
>
>     1) Seems that the use pattern is close to what I described - separe
> secret
>     backend that contains more "dynamic" secrets. And I think still being
> able
>     to used different connections is a nice way of accessing multiple
> backend
>     credentials within Airflow core. I think there was a good reason why
> only
>     one backend is considered for "core" and it really ill-sutied to
> support
>     multiple credential backends. I can hardly imagine reading
> connections, or
>     variables from multiple secret backends. How would you choose which
> backend
>     to use for different variables? Fallback mechanisms? I think it's
> hardly
>     useful.  Hooks on the other hand (via connections) has built in way to
>     choose different backends and it's use pattern for custom operators is
>     really standard "airflow" way.
>
>     2) Python operator is not the best idea, because you need to provide
>     credentials to access secret backend. It can be done - of course - via
>     environment variables. but using connection from Airlfow has the
> additional
>     advantage of being encrypted at rest in the database. And with Hooks
> being
>     the common denominator of accessing external services (secret backend
> being
>     one of them) - it can hide all the authorisation and communication
> details
>     from the operators using the hook (this is basically what hook is for).
>
>     3) I have a good parallell here I think.  I would compare my proposal
> to
>     the current way we use Postgres and MySQL hooks vs. using SQLAlchemy
> for
>     Airflow itself. While Airflow uses Postgres and MySQL to provide it's
>     internal database, it also has the "postgres" and "MySQL" providers
> that
>     provide hooks that access the database in a "generic" way (and those
> hooks
>     are used by a number of operators). We still can choose various
> databases
>     to connect to via hooks - even if "Airflow core" uses that single
> database.
>
>     J.
>
>     On Mon, May 18, 2020 at 5:57 PM Nathan Hadfield <
> nathan.hadfi...@king.com>
>     wrote:
>
>     > Yep, I understand.  I wasn't necessarily advocating for a Vault
> hook; just
>     > wanted to give some real world colour to the conversation and what
> we did
>     > to solve our needs prior to the secrets backend.
>     >
>     > I'm sure that extending the class would also enable the same
> functionality.
>     >
>     > Cheers,
>     >
>     > Nathan
>     >
>     > On 18/05/2020, 16:46, "Ash Berlin-Taylor" <a...@apache.org> wrote:
>     >
>     >     Accessing things that aren't connections or variables is,
> essentially
>     >     creating a third class of thing that Secrets store.
>     >
>     >     But that is a separate issue to what Jarek is proposing, which is
>     > Hooks.
>     >
>     >     For your use case a Python operator sounds like the best fit. A
> hook is
>     >     going to have to target the lowest common denominator, which
> means
>     >     vault-specific things are just a needless layer over the top.
>     >
>     >     Extending the existing Secrets Backend interface to support that
> is
>     >     doable, but I don't see the need for a Hook. Not everything
> needs to be
>     >     a hook :)
>     >
>     >     -ash
>     >
>     >
>     >     On May 18 2020, at 4:41 pm, Nathan Hadfield <
> nathan.hadfi...@king.com>
>     > wrote:
>     >
>     >     > Hey,
>     >     >
>     >     >
>     >     >
>     >     > My quick two cents are that it would be good to access secrets
> that
>     >     > are not explicitly either connections or variables
>     >     >
>     >     >
>     >     >
>     >     > We have a need for DAGs that feature more complex interactions
> with
>     >     > Vault - which typically end up being custom operators - that I
> think
>     >     > would be helped by more generic capabilities.
>     >     >
>     >     >
>     >     >
>     >     > For example, we have an automated system that regularly
> rotates GCP
>     >     > service accounts across the whole company and stores them in
> Vault.
>     >     > We then have to ensure that our different Looker environments
> always
>     >     > have these SAs before the old ones expire every 48 hours.  To
> do
>     > this,
>     >     > we wrote a Vault Hook and a Looker Hook and them combine them
> in an
>     >     > operator which would read every SA from a specific Vault path
> and
>     > then
>     >     > update the connection inside Looker.
>     >     >
>     >     >
>     >     >
>     >     > I don’t know if this will influence your thinking in any way
> but just
>     >     > wanted to briefly share our experiences.  If anyone would like
> to
>     >     > learn more then please reach out and I’d be happy to share
> more.
>     >     >
>     >     >
>     >     >
>     >     > Cheers,
>     >     >
>     >     > Nathan
>     >     >
>     >     >
>     >     >
>     >     > On 18/05/2020, 15:21, "Ash Berlin-Taylor" <a...@apache.org>
> wrote:
>     >     >
>     >     >
>     >     >
>     >     >
>     >     >
>     >     >    > The good thing with it is that you could have easily
> multiple
>     > secret
>     >     >
>     >     >    > backends configured to retrieve secrets for specific
> "service"
>     > (so
>     >     >
>     >     >    > that you
>     >     >
>     >     >    > could keep "generic airflow's secerts" in one backend but
> still
>     > have
>     >     >
>     >     >    > possibility of custom operators to use other backends
> (with
>     > different
>     >     >
>     >     >    > authentication, scopes etc.).
>     >     >
>     >     >
>     >     >
>     >     >    Having the ability to configure multiple secrets backends is
>     > independent
>     >     >
>     >     >    of this feature. The original PR/AIP to add Secrets Backends
>     >     > decided to
>     >     >
>     >     >    leave this ability out as it was more complex to configure.
> We
>     >     > could add
>     >     >
>     >     >    that back in.
>     >     >
>     >     >
>     >     >
>     >     >    I still don't quite get from your example where you are
> proposing
>     > this
>     >     >
>     >     >    would be used? Can you give a fuller example please? Do you
> have a
>     >     >
>     >     >    concrete use case where you need this?
>     >     >
>     >     >
>     >     >
>     >     >    Not everything in Airflow needs to be a hook; just access
> the
>     > secrets
>     >     >
>     >     >    backend directly. I'm not sure what wrapping an extra layer
>     > around these
>     >     >
>     >     >    classes gives us?
>     >     >
>     >     >
>     >     >
>     >     >    Without a concrete example I can't see anything other than
> this
>     >     > adds a
>     >     >
>     >     >    lot of complexity.
>     >     >
>     >     >
>     >     >
>     >     >    -ash
>     >     >
>     >     >
>     >     >
>     >     >
>     >     >
>     >     >    On May 18 2020, at 2:45 pm, Jarek Potiuk <
>     > jarek.pot...@polidea.com> wrote:
>     >     >
>     >     >
>     >     >
>     >     >    > Hello Everyone,
>     >     >
>     >     >    >
>     >     >
>     >     >    > TL;DR; I was just about to start to work on a small set of
>     > Hooks -
>     >     >
>     >     >    > dedicated to retrieving screts from the Secret Backend. I
>     >     > discussed it
>     >     >
>     >     >    > with Ash
>     >     >
>     >     >    > and Kamil
>     >     >
>     >     >    >
>     >     >
>     >     >
>     >     > <
>     >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__apache-2Dairflow.slack.com_archives_C0145R4NPS5_p1589805908013700&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=NBBItsFcPZR-C26VepQEehBPNPEWUsxar_DatX5ulco&e=
>     >     > > on
>     >     >
>     >     >    > Slack today. So far I thought I treat them as usual
> providers,
>     >     > but Ash
>     >     >
>     >     >    > raised some valid concenrs. so I wanted to raise teh
> proposal
>     >     > before I
>     >     >
>     >     >    > start working on it/
>     >     >
>     >     >    >
>     >     >
>     >     >    > *Context:*
>     >     >
>     >     >    >
>     >     >
>     >     >    > Currently we have "Secret Backend" support built in in
> 2.0 and
>     >     >
>     >     >    > 1.10.10+. It
>     >     >
>     >     >    > includes retrieving the variable and connections (via
> Secret
>     >     > Manager class)
>     >     >
>     >     >    > for:
>     >     >
>     >     >    >
>     >     >
>     >     >    >   -  Hashicorp Vault
>     >     >
>     >     >    >   -  Secret Manager
>     >     >
>     >     >    >   -  KMS
>     >     >
>     >     >    >   -  AWS secret manager
>     >     >
>     >     >    >
>     >     >
>     >     >    > Those secret managers are configured in:
>     >     >
>     >     >    >
>     >     >
>     >     >    > [secret]
>     >     >
>     >     >    > backend=<SecretManagerClass>
>     >     >
>     >     >    > backend_kwargs={}
>     >     >
>     >     >    >
>     >     >
>     >     >    > Those are available for use in a nice way (via Jinja
> templates
>     >     > and the
>     >     >
>     >     >    > like), but they need support in the Core of Airlfow (so
> require
>     > 1.10.10+).
>     >     >
>     >     >    > This means that if you are on pre 1.10.10 you cannot use
> those
>     > secrets.
>     >     >
>     >     >    > Currently you can only use one secret per whole Airflow
>     > installation
>     >     >
>     >     >    > so if
>     >     >
>     >     >    > your secrets are split between several secret managers
> (or if
>     >     > secrets for
>     >     >
>     >     >    > particular service require different credentials) - you
> cannot
>     >     > use the
>     >     >
>     >     >    > mechanism to access such distributed secrets. It's not
> often
>     >     > case, but I
>     >     >
>     >     >    > very well imagine it might happen that there are
> different sets
>     > of
>     >     >
>     >     >    > credentials to access different secrets - some services
> might
>     > have
>     >     >
>     >     >    > different scopes/level of access needed. .
>     >     >
>     >     >    >
>     >     >
>     >     >    > *Proposal*
>     >     >
>     >     >    >
>     >     >
>     >     >    > We have an idea that we might want also (on top of the
> above
>     > SecretManager
>     >     >
>     >     >    > implementation) define generic Hooks for accessing
> secrets from
>     > those
>     >     >
>     >     >    > services (just generic secrets, not connection,
> variables).
>     >     > Simply treat
>     >     >
>     >     >    > each of the backends above as another "provider" and
> create a
>     >     > Hook to
>     >     >
>     >     >    > access the service. Such Hook could have just one method:
>     >     >
>     >     >    >
>     >     >
>     >     >    > def get_secret(self, path_prefix: str, secret_id: str) ->
>     > Optional[str]
>     >     >
>     >     >    >
>     >     >
>     >     >    > It would use a connection defined (as usual) in ENV
> variables
>     > or database
>     >     >
>     >     >    > of Airflow to authenticate with the secret service and
> retrieve
>     > the
>     >     >
>     >     >    > secrets.
>     >     >
>     >     >    >
>     >     >
>     >     >    > The good thing with it is that you could have easily
> multiple
>     > secret
>     >     >
>     >     >    > backends configured to retrieve secrets for specific
> "service"
>     > (so
>     >     >
>     >     >    > that you
>     >     >
>     >     >    > could keep "generic airflow's secerts" in one backend but
> still
>     > have
>     >     >
>     >     >    > possibility of custom operators to use other backends
> (with
>     > different
>     >     >
>     >     >    > authentication,  scopes etc.). And it is not touching any
> of the
>     >     >
>     >     >    > "core" of
>     >     >
>     >     >    > Airflow. It's just a set of hooks with corresponding
> connections
>     >     > that work
>     >     >
>     >     >    > the same way as accessing any other provider in Airflow.
> No core
>     >     > of Airflow
>     >     >
>     >     >    > will be touched with this change.
>     >     >
>     >     >    >
>     >     >
>     >     >    > *Pros/Cons*
>     >     >
>     >     >    >
>     >     >
>     >     >    > *Con:*
>     >     >
>     >     >    >
>     >     >
>     >     >    > I do realise it is a bit of duplication in functionality.
> We
>     > already
>     >     >
>     >     >    > have a
>     >     >
>     >     >    > way to connect to a secret backend via airflow
> configuration and
>     >     > we should
>     >     >
>     >     >    > likely promote it rather than introduce additional
> mechanism.
>     >     >
>     >     >    >
>     >     >
>     >     >    > *Pros:*
>     >     >
>     >     >    >
>     >     >
>     >     >    > * Most of all -> it adds flexibility of accessing several
>     > secret backends
>     >     >
>     >     >    > for different use-cases. I looked at it so far in the way
> those
>     >     > hooks are
>     >     >
>     >     >    > merely another set of "provider hooks". For me this is
> nothing
>     > different
>     >     >
>     >     >    > than "providers" for any other services we have.  fFr
> example
>     > "cloudant"
>     >     >
>     >     >    > provider has only "CloudantHook" that other custom
> operators
>     > can use.
>     >     >
>     >     >    > And I
>     >     >
>     >     >    > well imagine this might be actually even more convenient
> to
>     > configure
>     >     >
>     >     >    > connections in the DB and access secrets this way rather
> than
>     >     > having to
>     >     >
>     >     >    > configure Secret Backends in Airflow configuration.
>     >     >
>     >     >    >
>     >     >
>     >     >    > * The dupication there it is very, very limited
> (basically a
>     > method
>     >     >
>     >     >    > call to
>     >     >
>     >     >    > secret backend).
>     >     >
>     >     >    >
>     >     >
>     >     >    > * Another benefit of it is that it would allow people
> still
>     > stuck
>     >     > on pre
>     >     >
>     >     >    > 1.10.10 to  write custom operators that would like to use
>     > secret backends
>     >     >
>     >     >    > (via backport operators). And still continue doing it in
> the
>     > future
>     >     >
>     >     >    > (possibly migrating to 2.0/1.10.10+ in cases when there
> is one
>     > secret
>     >     >
>     >     >    > backed only - but continue ot use connections/hooks where
> some
>     > specific
>     >     >
>     >     >    > secrets shoudl be kept in different secret backend.
>     >     >
>     >     >    >
>     >     >
>     >     >    > I would like to hear your opinion on that.
>     >     >
>     >     >    >
>     >     >
>     >     >    > J.
>     >     >
>     >     >    >
>     >     >
>     >     >    > --
>     >     >
>     >     >    >
>     >     >
>     >     >    > Jarek Potiuk
>     >     >
>     >     >    > Polidea
>     >     > <
>     >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=r-sJroKu0X4XYmnboaHwbpbEIgk5TLwTErkDtwEgvog&e=
>     >     > > | Principal Software Engineer
>     >     >
>     >     >    >
>     >     >
>     >     >    > M: +48 660 796 129 <+48660796129>
>     >     >
>     >     >    > [image: Polidea]
>     >     > <
>     >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=r-sJroKu0X4XYmnboaHwbpbEIgk5TLwTErkDtwEgvog&e=
>     > >
>     >     >
>     >     >    >
>     >     >
>     >
>     >
>
>     --
>
>     Jarek Potiuk
>     Polidea <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwIFaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=2Np0DHPYBn3aIynz3Rjb_Chh91zIO8nPv_zlsiom6cU&s=bTrXyNYkUkvEsg8UsK8c5R5LlmeljEtcrp9EqmxS-hM&e=
> > | Principal Software Engineer
>
>     M: +48 660 796 129 <+48660796129>
>     [image: Polidea] <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwIFaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=2Np0DHPYBn3aIynz3Rjb_Chh91zIO8nPv_zlsiom6cU&s=bTrXyNYkUkvEsg8UsK8c5R5LlmeljEtcrp9EqmxS-hM&e=
> >
>
>

Reply via email to