Yep. That is exactly why I think we need the hooks. Is it possible you donate your code for the Vault Hook implementation ?
I would love to use it for my implementation. (And make you or whoever the author is as co-author :) J. wt., 19 maj 2020, 09:41 użytkownik Nathan Hadfield <nathan.hadfi...@king.com> napisał: > Jarek, > > We are already using the secret backend for Airflow variables. But, > because of the example I explained and also a programmatic need to update > our GCP Airflow connections every day, then we still have to maintain a > secondary, custom method for Vault authentication and manipulation of other > secrets. > > Cheers, > > Nathan > > On 18/05/2020, 20:07, "Jarek Potiuk" <jarek.pot...@polidea.com> wrote: > > Thanks Nathan, > > I think your case is really good example where the Hook might be really > useful (and apparently somebody did it already via Hooks). > > I wonder Nathan if you (in the future) switch to secret backend - > would you > use the same secret backend for Airlfow connections/variables? Or do > you > foresee that you will have another backend/credentials to access it? > > Maybe others had similar experiences - and would like to share it here? > > I still think there is a valid point in having separate hooks. Those > are my > points: > > 1) Seems that the use pattern is close to what I described - separe > secret > backend that contains more "dynamic" secrets. And I think still being > able > to used different connections is a nice way of accessing multiple > backend > credentials within Airflow core. I think there was a good reason why > only > one backend is considered for "core" and it really ill-sutied to > support > multiple credential backends. I can hardly imagine reading > connections, or > variables from multiple secret backends. How would you choose which > backend > to use for different variables? Fallback mechanisms? I think it's > hardly > useful. Hooks on the other hand (via connections) has built in way to > choose different backends and it's use pattern for custom operators is > really standard "airflow" way. > > 2) Python operator is not the best idea, because you need to provide > credentials to access secret backend. It can be done - of course - via > environment variables. but using connection from Airlfow has the > additional > advantage of being encrypted at rest in the database. And with Hooks > being > the common denominator of accessing external services (secret backend > being > one of them) - it can hide all the authorisation and communication > details > from the operators using the hook (this is basically what hook is for). > > 3) I have a good parallell here I think. I would compare my proposal > to > the current way we use Postgres and MySQL hooks vs. using SQLAlchemy > for > Airflow itself. While Airflow uses Postgres and MySQL to provide it's > internal database, it also has the "postgres" and "MySQL" providers > that > provide hooks that access the database in a "generic" way (and those > hooks > are used by a number of operators). We still can choose various > databases > to connect to via hooks - even if "Airflow core" uses that single > database. > > J. > > On Mon, May 18, 2020 at 5:57 PM Nathan Hadfield < > nathan.hadfi...@king.com> > wrote: > > > Yep, I understand. I wasn't necessarily advocating for a Vault > hook; just > > wanted to give some real world colour to the conversation and what > we did > > to solve our needs prior to the secrets backend. > > > > I'm sure that extending the class would also enable the same > functionality. > > > > Cheers, > > > > Nathan > > > > On 18/05/2020, 16:46, "Ash Berlin-Taylor" <a...@apache.org> wrote: > > > > Accessing things that aren't connections or variables is, > essentially > > creating a third class of thing that Secrets store. > > > > But that is a separate issue to what Jarek is proposing, which is > > Hooks. > > > > For your use case a Python operator sounds like the best fit. A > hook is > > going to have to target the lowest common denominator, which > means > > vault-specific things are just a needless layer over the top. > > > > Extending the existing Secrets Backend interface to support that > is > > doable, but I don't see the need for a Hook. Not everything > needs to be > > a hook :) > > > > -ash > > > > > > On May 18 2020, at 4:41 pm, Nathan Hadfield < > nathan.hadfi...@king.com> > > wrote: > > > > > Hey, > > > > > > > > > > > > My quick two cents are that it would be good to access secrets > that > > > are not explicitly either connections or variables > > > > > > > > > > > > We have a need for DAGs that feature more complex interactions > with > > > Vault - which typically end up being custom operators - that I > think > > > would be helped by more generic capabilities. > > > > > > > > > > > > For example, we have an automated system that regularly > rotates GCP > > > service accounts across the whole company and stores them in > Vault. > > > We then have to ensure that our different Looker environments > always > > > have these SAs before the old ones expire every 48 hours. To > do > > this, > > > we wrote a Vault Hook and a Looker Hook and them combine them > in an > > > operator which would read every SA from a specific Vault path > and > > then > > > update the connection inside Looker. > > > > > > > > > > > > I don’t know if this will influence your thinking in any way > but just > > > wanted to briefly share our experiences. If anyone would like > to > > > learn more then please reach out and I’d be happy to share > more. > > > > > > > > > > > > Cheers, > > > > > > Nathan > > > > > > > > > > > > On 18/05/2020, 15:21, "Ash Berlin-Taylor" <a...@apache.org> > wrote: > > > > > > > > > > > > > > > > > > > The good thing with it is that you could have easily > multiple > > secret > > > > > > > backends configured to retrieve secrets for specific > "service" > > (so > > > > > > > that you > > > > > > > could keep "generic airflow's secerts" in one backend but > still > > have > > > > > > > possibility of custom operators to use other backends > (with > > different > > > > > > > authentication, scopes etc.). > > > > > > > > > > > > Having the ability to configure multiple secrets backends is > > independent > > > > > > of this feature. The original PR/AIP to add Secrets Backends > > > decided to > > > > > > leave this ability out as it was more complex to configure. > We > > > could add > > > > > > that back in. > > > > > > > > > > > > I still don't quite get from your example where you are > proposing > > this > > > > > > would be used? Can you give a fuller example please? Do you > have a > > > > > > concrete use case where you need this? > > > > > > > > > > > > Not everything in Airflow needs to be a hook; just access > the > > secrets > > > > > > backend directly. I'm not sure what wrapping an extra layer > > around these > > > > > > classes gives us? > > > > > > > > > > > > Without a concrete example I can't see anything other than > this > > > adds a > > > > > > lot of complexity. > > > > > > > > > > > > -ash > > > > > > > > > > > > > > > > > > On May 18 2020, at 2:45 pm, Jarek Potiuk < > > jarek.pot...@polidea.com> wrote: > > > > > > > > > > > > > Hello Everyone, > > > > > > > > > > > > > > TL;DR; I was just about to start to work on a small set of > > Hooks - > > > > > > > dedicated to retrieving screts from the Secret Backend. I > > > discussed it > > > > > > > with Ash > > > > > > > and Kamil > > > > > > > > > > > > > > > > < > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__apache-2Dairflow.slack.com_archives_C0145R4NPS5_p1589805908013700&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=NBBItsFcPZR-C26VepQEehBPNPEWUsxar_DatX5ulco&e= > > > > on > > > > > > > Slack today. So far I thought I treat them as usual > providers, > > > but Ash > > > > > > > raised some valid concenrs. so I wanted to raise teh > proposal > > > before I > > > > > > > start working on it/ > > > > > > > > > > > > > > *Context:* > > > > > > > > > > > > > > Currently we have "Secret Backend" support built in in > 2.0 and > > > > > > > 1.10.10+. It > > > > > > > includes retrieving the variable and connections (via > Secret > > > Manager class) > > > > > > > for: > > > > > > > > > > > > > > - Hashicorp Vault > > > > > > > - Secret Manager > > > > > > > - KMS > > > > > > > - AWS secret manager > > > > > > > > > > > > > > Those secret managers are configured in: > > > > > > > > > > > > > > [secret] > > > > > > > backend=<SecretManagerClass> > > > > > > > backend_kwargs={} > > > > > > > > > > > > > > Those are available for use in a nice way (via Jinja > templates > > > and the > > > > > > > like), but they need support in the Core of Airlfow (so > require > > 1.10.10+). > > > > > > > This means that if you are on pre 1.10.10 you cannot use > those > > secrets. > > > > > > > Currently you can only use one secret per whole Airflow > > installation > > > > > > > so if > > > > > > > your secrets are split between several secret managers > (or if > > > secrets for > > > > > > > particular service require different credentials) - you > cannot > > > use the > > > > > > > mechanism to access such distributed secrets. It's not > often > > > case, but I > > > > > > > very well imagine it might happen that there are > different sets > > of > > > > > > > credentials to access different secrets - some services > might > > have > > > > > > > different scopes/level of access needed. . > > > > > > > > > > > > > > *Proposal* > > > > > > > > > > > > > > We have an idea that we might want also (on top of the > above > > SecretManager > > > > > > > implementation) define generic Hooks for accessing > secrets from > > those > > > > > > > services (just generic secrets, not connection, > variables). > > > Simply treat > > > > > > > each of the backends above as another "provider" and > create a > > > Hook to > > > > > > > access the service. Such Hook could have just one method: > > > > > > > > > > > > > > def get_secret(self, path_prefix: str, secret_id: str) -> > > Optional[str] > > > > > > > > > > > > > > It would use a connection defined (as usual) in ENV > variables > > or database > > > > > > > of Airflow to authenticate with the secret service and > retrieve > > the > > > > > > > secrets. > > > > > > > > > > > > > > The good thing with it is that you could have easily > multiple > > secret > > > > > > > backends configured to retrieve secrets for specific > "service" > > (so > > > > > > > that you > > > > > > > could keep "generic airflow's secerts" in one backend but > still > > have > > > > > > > possibility of custom operators to use other backends > (with > > different > > > > > > > authentication, scopes etc.). And it is not touching any > of the > > > > > > > "core" of > > > > > > > Airflow. It's just a set of hooks with corresponding > connections > > > that work > > > > > > > the same way as accessing any other provider in Airflow. > No core > > > of Airflow > > > > > > > will be touched with this change. > > > > > > > > > > > > > > *Pros/Cons* > > > > > > > > > > > > > > *Con:* > > > > > > > > > > > > > > I do realise it is a bit of duplication in functionality. > We > > already > > > > > > > have a > > > > > > > way to connect to a secret backend via airflow > configuration and > > > we should > > > > > > > likely promote it rather than introduce additional > mechanism. > > > > > > > > > > > > > > *Pros:* > > > > > > > > > > > > > > * Most of all -> it adds flexibility of accessing several > > secret backends > > > > > > > for different use-cases. I looked at it so far in the way > those > > > hooks are > > > > > > > merely another set of "provider hooks". For me this is > nothing > > different > > > > > > > than "providers" for any other services we have. fFr > example > > "cloudant" > > > > > > > provider has only "CloudantHook" that other custom > operators > > can use. > > > > > > > And I > > > > > > > well imagine this might be actually even more convenient > to > > configure > > > > > > > connections in the DB and access secrets this way rather > than > > > having to > > > > > > > configure Secret Backends in Airflow configuration. > > > > > > > > > > > > > > * The dupication there it is very, very limited > (basically a > > method > > > > > > > call to > > > > > > > secret backend). > > > > > > > > > > > > > > * Another benefit of it is that it would allow people > still > > stuck > > > on pre > > > > > > > 1.10.10 to write custom operators that would like to use > > secret backends > > > > > > > (via backport operators). And still continue doing it in > the > > future > > > > > > > (possibly migrating to 2.0/1.10.10+ in cases when there > is one > > secret > > > > > > > backed only - but continue ot use connections/hooks where > some > > specific > > > > > > > secrets shoudl be kept in different secret backend. > > > > > > > > > > > > > > I would like to hear your opinion on that. > > > > > > > > > > > > > > J. > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > Jarek Potiuk > > > > > > > Polidea > > > < > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=r-sJroKu0X4XYmnboaHwbpbEIgk5TLwTErkDtwEgvog&e= > > > > | Principal Software Engineer > > > > > > > > > > > > > > M: +48 660 796 129 <+48660796129> > > > > > > > [image: Polidea] > > > < > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=r-sJroKu0X4XYmnboaHwbpbEIgk5TLwTErkDtwEgvog&e= > > > > > > > > > > > > > > > > > > > -- > > Jarek Potiuk > Polidea < > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwIFaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=2Np0DHPYBn3aIynz3Rjb_Chh91zIO8nPv_zlsiom6cU&s=bTrXyNYkUkvEsg8UsK8c5R5LlmeljEtcrp9EqmxS-hM&e= > > | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] < > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwIFaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=2Np0DHPYBn3aIynz3Rjb_Chh91zIO8nPv_zlsiom6cU&s=bTrXyNYkUkvEsg8UsK8c5R5LlmeljEtcrp9EqmxS-hM&e= > > > >