@Ash Berlin-Taylor<mailto:[email protected]> – I don’t think that is entirely true. In 1.10 the connection templates code was part of the flask application and not bundled with the provider. Managed services took the webserver baseline as is and let the customers take decision on additives like FB-business, oracle etc.. without bundling them into the managed service software per AWS compliance guidelines. In 2.0 if we bake in all the providers, it will mean that we are baking in their dependencies along with.
Eg: search for “facebook-business” as an example in the following files 1.10 constraints file – https://github.com/apache/airflow/blob/constraints-1-10/constraints-3.7.txt (does not have facebook-business as dependency) 2.0 constraints file - https://github.com/apache/airflow/blob/constraints-2-0/constraints-3.7.txt (this contains facebook-business as dependency) This is one example, I can pull in other LGPL ones similarly. The point is that the connections code from flask app now lives elsewhere and therein bringing in the requirements for everything related to the provider as one package. Regarding security constraints on why we disallow plugins and requirements on the webserver, I will have to discuss this in person on PMC but on a high level this comes down to remote code execution prevention on managed instances, opening possibilities of exploiting vulnerabilities on the flask-app-builder and the underlying python runtime. There is 2 levels of isolation – one on the single tenancy of environments in MWAA under separate VPCs, and secondly on Fargate that prevents exploits to break out of the container boundaries into the hypervisor. Even with those, our security team had other possibilities of exploits unearthed in penetration testing that led to this decision. Thanks Subash From: Ash Berlin-Taylor <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Tuesday, June 15, 2021 at 7:18 AM To: "[email protected]" <[email protected]> Subject: RE: [EXTERNAL] [DISCUSS] Managing provider Connections via UI in managed Airflow services CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi Subash, If your concern is about licensing then you have a false sense of compliance in 1.10 -- the dependencies for the "providers" between 1.10 and 2.0 haven't really changed -- the same (L)GPL, Facebook etc licensed modules are still there in 1.10 and the 2.0 providers. My same question to you: can iterate (private if necessary) what the security concerns here are? -ash On Mon, Jun 14 2021 at 19:59:25 -0000, Subash Canapathy <[email protected]> wrote: Hi Jarek Thank you for surfacing this issue on a discussion. The major hurdle for managed services apart from the security constraints is on the licensing side. Previously when the code needed for connection templates was part of Airflow, we were able to bundle them as a solution as the code was under the Apache v2 license. Now that we have them separated out as provider packages, those come with dependencies that do not have "blessed" licenses that allow bundling them into managed service. I am sure GCP folks have similar restrictions on why they cannot add all 60+ providers as is into the base image. We recently did the manual exercise to assess each of those provider package and their dependencies, and only 20 of them made the cut for not having to use additional licenses like Facebook license, LGPL etc. Thanks Subash Canapathy On 2021/06/14 16:28:46, Ash Berlin-Taylor <[email protected]<mailto:[email protected]>> wrote: Can you elaborate (privately if you have to) on what the security concerns are? Since as I understand it the web server is powery deployment, so anything should be limited to one customer/user/deployment. There is also the new "test connection" feature that will need the provider code installed to work. Then there's the issue of third party connections - of which there is only going to be more of over time. -ash On 14 June 2021 16:35:42 BST, Eugen Kosteev <[email protected]<mailto:[email protected]>> wrote: >Hi Jarek. > >Thanks for the discussion. >The issue with Connections management in the web server that you described >is indeed affected Cloud Composer in the released preview image versions of >Airflow 2.0.1 (link to public issue >https://issuetracker.google.com/issues/190189297). And as you stated, we do >not install pypi packages in web server image mostly because of security >concerns. > >As a temporary workaround we baked all connections (list of them with their >widgets pickled and stored inside) into a web server image, so that >customers can add/edit them (even though not all providers packages are >pre-installed). This is a temporary workaround that we came up with for now >and we are looking for a long-term solution. > >Our thoughts/ideas for alternative solutions: >1. We do not want to pre-install all providers packages as to not generate >unnecessary python dependencies. Or maybe we could do this only for web >server images (not scheduler/worker) but then it is not clear if this is a >good idea to have such occured discrepancy between pypi dependencies in web >server vs scheduler/worker images. >2. Downloading and backing in providers packages (wheel files) into docker >image and installing customer specific/required version on demand looks >infeasible, taking into account number of providers, their versions and >their dependencies. > >- Eugene > >On Sun, Jun 13, 2021 at 6:46 PM Jarek Potiuk <[email protected]<mailto:[email protected]>> wrote: > >> Dear Airflow community, >> >> Here is another result of discussions. I would like to raise an attention >> to potential Connection management problems that might affect managed >> services for Airflow 2.0 and some providers. >> >> With Airflow 2.0, connection UI "customisations" are baked into the >> provider package and in order to see - for example Postgres connection in >> the UI, you need to have the "postgres" provider installed in the Webserver. >> >> As far as I know some of the Managed Airflow services (MWAA, Composer, >> possibly other) do not currently allow their users installation of >> additional packages in the webserver (the webserver container is different >> than the scheduler/worker). This makes it impossible to configure/edit >> provider connections via UI (unless those providers are pre-installed in >> the webserver image). >> >> While this is understandable from security point of view to forbid "any'' >> package installation, I think the official >> "apache-airlfow-providers-*" should be allowlisted for those images and >> installed or otherwise made available (for example via pre-installing all >> providers in the webserver image if this is not possible from security >> point of view to rebuild the image dynamically) >> >> I wonder what people (and especially the people from MWAA, Composer team) >> think about it - do I get it right about the security concerns? Any other >> comments? >> >> >> J. >> >> -- >> +48 660 796 129 >> > > >-- >Eugene
