potiuk commented on PR #35748: URL: https://github.com/apache/airflow/pull/35748#issuecomment-1818945067
Shared process namespace in this case kind of violates the whole assumption the separate kerberos container here is introduced for. The whole idea is that "airflow" containers do not see the kerberors / keytab used to refresh the short time token. Only kerberos container/refreshing process should ever be able to get access to keytab. Keytab are long-living and provide full access to kerberos service. They use symmetric encryption and once you get hold of it, you are able to communicate with Kerberos server and obtain the short living ticket to do the job you are supposed to do - so it should be very strongly guarded property. Generally Airflow components should never be able to see or obtain keytab files (only `airflow kerberos` refreshing container should have access to it) - instead all the components should only access the short-living ticket. This is the basic assumption that the whole separtion of the containers is based on - the two containers share only filesystem where the ticket is refreshed and nothing else. See https://www.fortinet.com/resources/cyberglossary/kerberos-authentication for example. If airflow components (specifically worker) will get access to keytab, someone could write a DAG to - for example - send the keytab to a remote system and once this happens and such keytab can be used by anyone to do anything. When you do the same with short living ticket, you are limited to only what the service allows the ticket to do + it's time limited so exposure of such breach is potentially much more limited. Of course, ysing SharedProcessNamespace is not explicitly violating this assumption. It does not give the airflow component direct access to the keytab. So far so good. But it opens a way to other ways of obtaining the keytab by malicious actors. For example, you could connect to the kerberos process with gdb and dump the memory of it and retrieve the keytab from memory of that process. And this is only one of the ways you could retrieve that information, there are many others. Some of them you might protect better from, but generally speaking, it significantly decreases the isolation that container mechanism introduces (and the reason why the two are run in separate containers). So while this solution is not directly giving access to the keytab, it does decrease isolation and introduces security risks. I am quite sure when we introduce it, this will be flagged as a security issue and we will have to fix it. I think there are two options: 1) we should figure out a different way how to communicate with the kerberos refreshing process and stop it - one of the options is to let the running process be stoppable by sending a "shutdown" message via TCP connection - and expose the sidecar container's port to the main containers. Another option (probably simpler to implement) is to keep some kind of "shutdown" lock file in the same filesystem where the ticket is stored and signal the rerfreshing process that it should exit. It could be based on lock mechanism available natively in python. 2) If we limit the solution to only kubernetes 1.28+, we have the option of using Native Sidecar Containers and marking the kerberos sidecar as such https://kubernetes.io/blog/2023/08/25/native-sidecar-containers/ Likely the best is combination of these 1) (for k8s < 1.28) and 2) (for k8s >= 1.28). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
