Hi, We run Apache Airflow as a set of k8s deployments inside of a GKE cluster, similar to the way specified in Mumoshu's github repo: https://github.com/mumoshu/kube-airflow.
We are investigating securing our use of Airflow and are wondering about some of Airflow's implementation details. Specifically, we run some tasks where the workers have access to sensitive data. Some of the data can make its way into the task logs. However, we want to make sure isn't passed around eg. to the scheduler/database/message queue, and if it is, it should be encrypted in any network traffic (eg. via mutual tls). - Does airflow pass around logs to the postgres db, or rabbitmq? - Is the information in postgres mainly operational in nature? - Is the information in rabbitmq mainly operational in nature? - What about the scheduler? - Anything else we're missing? Any ideas are appreciated! Thanks in advance!
