Hi,

We run Apache Airflow as a set of k8s deployments inside of a GKE cluster,
similar to the way specified in Mumoshu's github repo:
https://github.com/mumoshu/kube-airflow.

We are investigating securing our use of Airflow and are wondering about
some of Airflow's implementation details. Specifically, we run some tasks
where the workers have access to sensitive data. Some of the data can make
its way into the task logs. However, we want to make sure isn't passed
around eg. to the scheduler/database/message queue, and if it is, it should
be encrypted in any network traffic (eg. via mutual tls).

- Does airflow pass around logs to the postgres db, or rabbitmq?
- Is the information in postgres mainly operational in nature?
- Is the information in rabbitmq mainly operational in nature?
- What about the scheduler?
- Anything else we're missing?

Any ideas are appreciated!

Thanks in advance!

Reply via email to