Hi Folks, Product Manager for MWAA weighing in here, having spoken to--quite literally--hundreds of Airflow customers (both MWAA and in general).
Enterprise organizations--those that use Airflow at scale--typically separate their "Administrators" from their "Users". The former sets up the security controls, and makes sure that users can't violate their organization's data security while still providing access to (often sensitive) data in order to accomplish their business goals. The latter are the folks writing DAGs and monitoring their execution, and sometimes see those security controls as a hinderance to the ease at which they can write their data pipelines and orchestration. The weak spot in the security model is the web based user interface. It needs to be accessible to users, sitting at their laptops, with relative ease but cannot be permitted to perform arbitrary tasks otherwise it can escape the bounds set to it. Airflow is wonderful in that it's entirely written in Python and extensible. However, that same ease of extensibility could easily be used to bypass the Administrator's security controls, such as auth plugins, and allow users access beyond which they should rightfully have (whether deliberately or by accident). The only way to be 100% sure that users aren't changing the way the web server behaves is to not permit its alteration. UI plugins, package installations, and library changes are among the various vulnerabilities that could be exploited. For example, I could write a plugin that patches the auth functions and allows everyone Admin access regardless of their predetermined role. Without strict security controls there will be a limit to Airflow adoption amongst Enterprise customers. For Airflow to grow, it must offer a secure-by-design-friendly infrastructure. Ideally the web server is a window into what Airflow is doing, but does not allow access or modification to any of the internal behaviour of the system. Should there be some sort of signed and verified packages in the future, perhaps organizations will be more open to extensibility. However, the "shared responsibility model" does not allow service providers, be it Astronomer, Google, AWS, or anyone else, to be cavalier with customers security concerns and must always default to the strictest security defaults possible. Customers look to managed services to provide guard rails that prevent them from data breaches while still benefiting from the features and capabilities of the software platform. Cheers, John On 2021-06-18, 11:40 AM, "Jarek Potiuk" <[email protected]> wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. I agree that this thread is probably not good for categorization of the offering but I also concur with Ash to get a better understanding of the risks involved. I think I "feel" where it comes from and intuitively see that you might want to add additional or extra layers of precautions (and likely follow pressures from the internal security teams) but also Ash's point is quite important. We should get to the bottom of it, and if there are some real threats that we are not aware of, I think sharing details on [email protected] is the right thing to do. Maybe we will find that other users of Airflow are also at risk and we might want to protect them (and also all managed services but also individual installations) in the future by introducing some changes in this model. BTW. Subash - you do not need to have a subscription to write to [email protected]. Just send an -email with the details and we will get it and we will be able to keep you in discussion when it follows. Also information for your security team https://www.apache.org/dev/pmc.html#mailing-list-private . One of the main purposes of the private@ mailing list is pre-disclosing security problems related to the project. And we are all obliged as PMCs (and all ASF members who read the list as well) to not disclose what is discussed there. J, On Fri, Jun 18, 2021 at 4:04 PM Ash Berlin-Taylor <[email protected]> wrote: > > No one as yet explained what the security concerns actually are? Is there some concrete thing that is a worry, is it merely a concern that more things installed = marginally more risky? > > The blast radius is limited to a single Airflow deployment, and access is I assume sufficiently gated behind IAM perms anyway? > > By not letting users install extra modules in to the webserver image you are also removing their ability to use third party providers, such as these > > https://github.com/great-expectations/airflow-provider-great-expectations > https://github.com/fivetran/airflow-provider-fivetran > https://github.com/anyscale/airflow-provider-ray > > -- and there are only going to be more of these over time. > > Not to mention this blocks UI plugins entirely. > > I don't quite understand why MWAA concerns itself with exactly what is being installed in the webserver image on top of Airflow -- the Amazon Shared Responsibility model would I think already cover the "AWS takes care of the base, 'you' take care of what is running" (but I confess I haven't re-read it in a number of years) > > -ash > > On Fri, Jun 18 2021 at 07:06:53 +0000, "Canapathy, Subash" <[email protected]> wrote: > > Irrespective of personal categorization of the managed offerings Airflow-ness, there are obligations to adhere to a security bar and securing against any attack vectors a UI feature can introduce – and this will be true for any cloud service provider. I want to clarify that we were not suggesting to change any assumptions in current way of packaging providers but merely citing that we cannot use equivalence to earlier mono repo and add all 60+ of them on base image. > > > > Going back to the original discussion, we are in the process of pre-installing providers with Apache 2 license right away and others will be added (with approved exception) based on user demand. > > > > From: Ash Berlin-Taylor <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Wednesday, June 16, 2021 at 1:11 AM > To: "[email protected]" <[email protected]> > Subject: RE: [EXTERNAL] [DISCUSS] Managing provider Connections via UI in managed Airflow services > > > > On Tue, Jun 15 2021 at 18:21:56 +0000, "Canapathy, Subash" <[email protected]> wrote: > > Regarding security constraints on why we disallow plugins and requirements on the webserver, I will have to discuss this in person on PMC but on a high level this comes down to remote code execution prevention on managed instances, opening possibilities of exploiting vulnerabilities on the flask-app-builder and the underlying python runtime. > > > > I'm sorry, I don't agree with this summary. > > > > Airflow's job is to run user submitted code, and to allow the UI to be pluggable. > > > > Are you providing Airflow, or an Airflow like service? -- +48 660 796 129
