While Googling something Airflow-related a few weeks ago, I noticed that
someone's Airflow dashboard had been indexed by Google and was accessible
to the outside world without authentication. A little more Googling
revealed a handful of other indexed instances in various states of
security. I did my best to contact the operators, and waited for responses
before posting this.

Airflow is not a secure project by default (
https://issues.apache.org/jira/browse/AIRFLOW-2047), and you can do all
sorts of mean things to an instance that hasn't been intentionally locked
down. (And even then, you shouldn't rely exclusively on your app's
authentication for providing security.)

Having "internal" dashboards/data sources/executors exposed to the web is
dangerous, since old versions can stick around for a very long time, help
compromise unrelated deployments, and generally just create very bad press
for the overall project if there's ever a mass compromise (see: Redis and
MongoDB).

Shipping secure defaults is hard, but perhaps we could add best practices
like instructions for deploying a robots.txt with Airflow? Or an impact
statement about what someone could do if they access your Airflow instance?
I think that many people deploying Airflow for the first time might not
realize that it can get indexed, or how much damage someone can cause via
accessing it.

Reply via email to