potiuk edited a comment on pull request #19857:
URL: https://github.com/apache/airflow/pull/19857#issuecomment-980781255


   > @potiuk You don't necessary need to write a custom secrets backend in 
order to have secrets rotation. E.g. if you want to use aws secrets manager's 
built-in secrets rotation capabilities, the existing backend [now supports 
it](https://github.com/apache/airflow/pull/18764). and presumably with most of 
the backends we have, you could implement rotation with processes external to 
airflow.
   
   Yeah. I know that, but one problem I have with it is that it is AWS only and 
you have to add quite a lot of code to support similar patterns for other 
secrets. I thought about something more "generic" . Maybe we should apply the 
approach of AWS for all other backends? Maybe we can find a similar,  
generic-enough solution for all of them.  I think your proposal is attempt to 
do so, but I think (see below) it might be that it looks at the problem from a 
wrong angle.
    
   > is that really all that common?
   
   I have no "hard" data on that one. Just anecdotal evidence. 
   
   But when I take off my "Airflow" hat and start to think as a user (in this 
case a person who operates organisational secrets and has to organize 
management. security, rotation etc.), I believe our approach is very "airflow 
centric" and in fact "selfish". If I were such a person who manages 
"credentials" for an organisation, I'd be quite pissed of at Airflow. Why?
   
   Secrets storages are "shared" resources for an organisation. IMHO we should 
not think about them as "general storage" which adapts to the client but they 
are "centerpiece" of managing identities, secrets, passwords etc. by the 
organisations. And often follow a rigorous structure of secrets stored there, 
that is decided on an managed on "organisational" level. For example it is 
important, I believe, that if there is a service that can be acesses by many 
clients, the "secret" (password/token) is kept in the secret manager only once 
- and it's the clients should adapt to where and how it should be retrieved 
from. It simplifies rotation, security, access audits, security crisis 
responses etc.
   
   The big problem with the current approach of Airflow is that Airflow expects 
the secret to be in a format that only Airflow understands. And it's the same - 
whether it's URI or dictionary in your proposal. No other system that needs the 
same password for the same external service will be able to use it (unles they 
specifically implement support for "airflow-formatted-secret"). This basically 
means that if the same token needs to be used by another (non-Airlfow) client, 
the organisation will have to rotate it in two places - once in the format that 
is required by Airflow, and once in the format that is understood by the other 
client. 
   
   The AWS implementaiton is I think the manifestation of that issue - the 
reason why we have configurable fields is because we wanted to tap into "non 
airflow formats" of storing the passwords/token. Where I think the only 
"universal" assumption we can make is that "password" or "token" is stored as 
"single, standalone secret". Not as part of URI, not as part of dictionary of 
specific structure. This is one of the reasons why for example LDAP clients are 
so configurable - because the clients have to be able to adapt to the 
organisational schema chosen by the organisation, not the other way round. AWS 
implementation (with full_url_mode = false) now is pretty much what LDAP 
clients do.
   
   Looking at it from the organisation that has to maintain many systems, I 
think it's very "selfish" (and self-centered) from  Airflow side to expect the 
organisation to store the "token/password" in the format that only Airflow 
understands.
   
   > i think i'm a bit skeptical that we should get more involved in secrets / 
connections management / rotation. i think it might be best to leave it to the 
external tools.
   
   Absolutely agree. And I am not about managing the connection. The proposal I 
made with `SECRET:' is probably not good (for the reason you explained below. 
It was really one of the ways how to retrieve the password as single "secret" 
and plug-in the existing Airflow approach. But the AWS approach is indeed 
better (though more complex implementation and more difficult to generalize to 
all secret backends). Maybe we can find a solution that will:
   
   * apply to all secrets (like your dictionary proposal)
   * will have the configurabiliy that AWS secret backend has 
   * does not force the organisation to keep "credentials" as part of the 
structure that is Airflow-specific
    
   > and pulling _part_ of a connection from secrets backend and _another_ part 
of it from _another_ secrets backend... that could get confusing pretty 
quickly, particularly when considering the diversity in the structure of 
connections in airflow. e.g. it's not just `password` that might be "secret" 
and in need of rotation..
   
   Agree. My idea is quite bad from that point of view. Though it is generic 
enough - it is not limited to password only - the "SECRET://" pattern could be 
used in any of the fields of the connection (including extras).
   
   > separately, what do you think of the general idea of this PR though? 
should i proceed with it you think? i think i would also like to add abilitity 
to load from cli using json instead of URI (i.e. putting json on same level as 
URI) and, ultimately i think it's best to deprecate airflow URI but that could 
be more controversial.
   
   I think it depends on whether you (and others) would follow my proposed line 
of thinking. As explained above - it still requires from the organisation to 
keep the structure that only Airflow understands. I think this direction is not 
good in general, I think we should implement a way to retrieve secrets 
similarly to what AWS does rather than introduce yet-another format of storing 
secrets, but If I am the only one who thinks this is wrong and won't convince 
others, then yeah - it's certainly better than URI, so I am not going to block 
it.
   
   > already we have some secrets backends supporting json values, and we could 
continue to add support on a piecemeal basis, but i figure we should just make 
it first class citizen and standardize
   
   The "standardize" part is precisely what worries me as we are trying to 
impose "airflow-standard" on something that I feel we should not (i.e. 
centralized secret storage). 
   
   But - I might be as well wrong. Maybe it is common practice to store secrets 
in proprietary formats. Maybe it's simply worth to check it with a number of 
(big corporate) users who have their experiences with that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to