Hi Folks,

Product Manager for MWAA weighing in here, having spoken to--quite 
literally--hundreds of Airflow customers (both MWAA and in general).

Enterprise organizations--those that use Airflow at scale--typically separate 
their "Administrators" from their "Users".  The former sets up the security 
controls, and makes sure that users can't violate their organization's data 
security while still providing access to (often sensitive) data in order to 
accomplish their business goals.  The latter are the folks writing DAGs and 
monitoring their execution, and sometimes see those security controls as a 
hinderance to the ease at which they can write their data pipelines and 
orchestration.

The weak spot in the security model is the web based user interface.  It needs 
to be accessible to users, sitting at their laptops, with relative ease but 
cannot be permitted to perform arbitrary tasks otherwise it can escape the 
bounds set to it.  Airflow is wonderful in that it's entirely written in Python 
and extensible.  However, that same ease of extensibility could easily be used 
to bypass the Administrator's security controls, such as auth plugins, and 
allow users access beyond which they should rightfully have (whether 
deliberately or by accident).

The only way to be 100% sure that users aren't changing the way the web server 
behaves is to not permit its alteration.  UI plugins, package installations, 
and library changes are among the various vulnerabilities that could be 
exploited.  For example, I could write a plugin that patches the auth functions 
and allows everyone Admin access regardless of their predetermined role.  
Without strict security controls there will be a limit to Airflow adoption 
amongst Enterprise customers.  For Airflow to grow, it must offer a 
secure-by-design-friendly infrastructure.  Ideally the web server is a window 
into what Airflow is doing, but does not allow access or modification to any of 
the internal behaviour of the system.  

Should there be some sort of signed and verified packages in the future, 
perhaps organizations will be more open to extensibility.  However, the "shared 
responsibility model" does not allow service providers, be it Astronomer, 
Google, AWS, or anyone else, to be cavalier with customers security concerns 
and must always default to the strictest security defaults possible.  Customers 
look to managed services to provide guard rails that prevent them from data 
breaches while still benefiting from the features and capabilities of the 
software platform.

Cheers,

John

On 2021-06-18, 11:40 AM, "Jarek Potiuk" <[email protected]> wrote:

    CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



    I agree that this thread is probably not good for categorization of
    the offering but I also concur with Ash to get a better understanding
    of the risks involved.

    I think I "feel" where it comes from and intuitively see that you
    might want to add additional or extra layers of precautions (and
    likely follow pressures from the internal security teams) but also
    Ash's point is quite important. We should get to the bottom of it, and
    if there are some real threats that we are not aware of, I think
    sharing details on [email protected] is the right thing to
    do.

    Maybe we will find that other users of Airflow are also at risk and we
    might want to protect them (and also all managed services but also
    individual installations) in the future by introducing some changes in
    this model.

    BTW. Subash - you do not need to have a subscription to write to
    [email protected]. Just send an -email with the details and
    we will get it and we will be able to keep you in discussion when it
    follows. Also information for your security team
    https://www.apache.org/dev/pmc.html#mailing-list-private . One of the
    main purposes of the private@ mailing list is pre-disclosing security
    problems related to the project. And we are all obliged as PMCs (and
    all ASF members who read the list as well) to not disclose what is
    discussed there.

    J,

    On Fri, Jun 18, 2021 at 4:04 PM Ash Berlin-Taylor <[email protected]> wrote:
    >
    > No one as yet explained what the security concerns actually are? Is there 
some concrete thing that is a worry, is it merely a concern that more things 
installed = marginally more risky?
    >
    > The blast radius is limited to a single Airflow deployment, and access is 
I assume sufficiently gated behind IAM perms anyway?
    >
    > By not letting users install extra modules in to the webserver image you 
are also removing their ability to use third party providers, such as these
    >
    > https://github.com/great-expectations/airflow-provider-great-expectations
    > https://github.com/fivetran/airflow-provider-fivetran
    > https://github.com/anyscale/airflow-provider-ray
    >
    > -- and there are only going to be more of these over time.
    >
    > Not to mention this blocks UI plugins entirely.
    >
    > I don't quite understand why MWAA concerns itself with exactly what is 
being installed in the webserver image on top of Airflow -- the Amazon Shared 
Responsibility model would I think already cover the "AWS takes care of the 
base, 'you' take care of what is running" (but I confess I haven't re-read it 
in a number of years)
    >
    > -ash
    >
    > On Fri, Jun 18 2021 at 07:06:53 +0000, "Canapathy, Subash" 
<[email protected]> wrote:
    >
    > Irrespective of personal categorization of the managed offerings 
Airflow-ness, there are obligations to adhere to a security bar and securing 
against any attack vectors a UI feature can introduce – and this will be true 
for any cloud service provider. I want to clarify that we were not suggesting 
to change any assumptions in current way of packaging providers but merely 
citing that we cannot use equivalence to earlier mono repo and add all 60+ of 
them on base image.
    >
    >
    >
    > Going back to the original discussion, we are in the process of 
pre-installing providers with Apache 2 license right away and others will be 
added (with approved exception) based on user demand.
    >
    >
    >
    > From: Ash Berlin-Taylor <[email protected]>
    > Reply-To: "[email protected]" <[email protected]>
    > Date: Wednesday, June 16, 2021 at 1:11 AM
    > To: "[email protected]" <[email protected]>
    > Subject: RE: [EXTERNAL] [DISCUSS] Managing provider Connections via UI in 
managed Airflow services
    >
    >
    >
    > On Tue, Jun 15 2021 at 18:21:56 +0000, "Canapathy, Subash" 
<[email protected]> wrote:
    >
    > Regarding security constraints on why we disallow plugins and 
requirements on the webserver, I will have to discuss this in person on PMC but 
on a high level this comes down to remote code execution prevention on managed 
instances, opening possibilities of exploiting vulnerabilities on the 
flask-app-builder and the underlying python runtime.
    >
    >
    >
    > I'm sorry, I don't agree with this summary.
    >
    >
    >
    > Airflow's job is to run user submitted code, and to allow the UI to be 
pluggable.
    >
    >
    >
    > Are you providing Airflow, or an Airflow like service?



    --
    +48 660 796 129

Reply via email to