@Ash Berlin-Taylor<mailto:[email protected]> – I don’t think that is entirely 
true. In 1.10 the connection templates code was part of the flask application 
and not bundled with the provider. Managed services took the webserver baseline 
as is and let the customers take decision on additives like FB-business, oracle 
etc.. without bundling them into the managed service software per AWS 
compliance guidelines. In 2.0 if we bake in all the providers, it will mean 
that we are baking in their dependencies along with.

Eg: search for “facebook-business” as an example in the following files
1.10 constraints file – 
https://github.com/apache/airflow/blob/constraints-1-10/constraints-3.7.txt 
(does not have facebook-business as dependency)
2.0 constraints file - 
https://github.com/apache/airflow/blob/constraints-2-0/constraints-3.7.txt 
(this contains facebook-business as dependency)

This is one example, I can pull in other LGPL ones similarly. The point is that 
the connections code from flask app now lives elsewhere and therein bringing in 
the requirements for everything related to the provider as one package.

Regarding security constraints on why we disallow plugins and requirements on 
the webserver, I will have to discuss this in person on PMC but on a high level 
this comes down to remote code execution prevention on managed instances, 
opening possibilities of exploiting vulnerabilities on the flask-app-builder 
and the underlying python runtime. There is 2 levels of isolation – one on the 
single tenancy of environments in MWAA under separate VPCs, and secondly on 
Fargate that prevents exploits to break out of the container boundaries into 
the hypervisor. Even with those, our security team had other possibilities of 
exploits unearthed in penetration testing that led to this decision.

Thanks
Subash

From: Ash Berlin-Taylor <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Tuesday, June 15, 2021 at 7:18 AM
To: "[email protected]" <[email protected]>
Subject: RE: [EXTERNAL] [DISCUSS] Managing provider Connections via UI in 
managed Airflow services


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Hi Subash,

If your concern is about licensing then you have a false sense of compliance  
in 1.10 -- the dependencies for the "providers" between 1.10 and 2.0 haven't 
really changed -- the same (L)GPL, Facebook etc licensed modules are still 
there in 1.10 and the 2.0 providers.

My same question to you: can iterate (private if necessary) what the security 
concerns here are?

-ash


On Mon, Jun 14 2021 at 19:59:25 -0000, Subash Canapathy 
<[email protected]> wrote:

Hi Jarek Thank you for surfacing this issue on a discussion. The major hurdle 
for managed services apart from the security constraints is on the licensing 
side. Previously when the code needed for connection templates was part of 
Airflow, we were able to bundle them as a solution as the code was under the 
Apache v2 license. Now that we have them separated out as provider packages, 
those come with dependencies that do not have "blessed" licenses that allow 
bundling them into managed service. I am sure GCP folks have similar 
restrictions on why they cannot add all 60+ providers as is into the base 
image. We recently did the manual exercise to assess each of those provider 
package and their dependencies, and only 20 of them made the cut for not having 
to use additional licenses like Facebook license, LGPL etc. Thanks Subash 
Canapathy On 2021/06/14 16:28:46, Ash Berlin-Taylor 
<[email protected]<mailto:[email protected]>> wrote:
Can you elaborate (privately if you have to) on what the security concerns are? 
Since as I understand it the web server is powery deployment, so anything 
should be limited to one customer/user/deployment. There is also the new "test 
connection" feature that will need the provider code installed to work. Then 
there's the issue of third party connections - of which there is only going to 
be more of over time. -ash On 14 June 2021 16:35:42 BST, Eugen Kosteev 
<[email protected]<mailto:[email protected]>> wrote: >Hi Jarek. > >Thanks for 
the discussion. >The issue with Connections management in the web server that 
you described >is indeed affected Cloud Composer in the released preview image 
versions of >Airflow 2.0.1 (link to public issue 
>https://issuetracker.google.com/issues/190189297). And as you stated, we do 
>not install pypi packages in web server image mostly because of security 
>concerns. > >As a temporary workaround we baked all connections (list of them 
with their >widgets pickled and stored inside) into a web server image, so that 
>customers can add/edit them (even though not all providers packages are 
>pre-installed). This is a temporary workaround that we came up with for now 
>and we are looking for a long-term solution. > >Our thoughts/ideas for 
alternative solutions: >1. We do not want to pre-install all providers packages 
as to not generate >unnecessary python dependencies. Or maybe we could do this 
only for web >server images (not scheduler/worker) but then it is not clear if 
this is a >good idea to have such occured discrepancy between pypi dependencies 
in web >server vs scheduler/worker images. >2. Downloading and backing in 
providers packages (wheel files) into docker >image and installing customer 
specific/required version on demand looks >infeasible, taking into account 
number of providers, their versions and >their dependencies. > >- Eugene > >On 
Sun, Jun 13, 2021 at 6:46 PM Jarek Potiuk 
<[email protected]<mailto:[email protected]>> wrote: > >> Dear Airflow community, 
>> >> Here is another result of discussions. I would like to raise an attention 
>> to potential Connection management problems that might affect managed >> 
services for Airflow 2.0 and some providers. >> >> With Airflow 2.0, connection 
UI "customisations" are baked into the >> provider package and in order to see 
- for example Postgres connection in >> the UI, you need to have the "postgres" 
provider installed in the Webserver. >> >> As far as I know some of the Managed 
Airflow services (MWAA, Composer, >> possibly other) do not currently allow 
their users installation of >> additional packages in the webserver (the 
webserver container is different >> than the scheduler/worker). This makes it 
impossible to configure/edit >> provider connections via UI (unless those 
providers are pre-installed in >> the webserver image). >> >> While this is 
understandable from security point of view to forbid "any'' >> package 
installation, I think the official >> "apache-airlfow-providers-*" should be 
allowlisted for those images and >> installed or otherwise made available (for 
example via pre-installing all >> providers in the webserver image if this is 
not possible from security >> point of view to rebuild the image dynamically) 
>> >> I wonder what people (and especially the people from MWAA, Composer team) 
>> think about it - do I get it right about the security concerns? Any other >> 
comments? >> >> >> J. >> >> -- >> +48 660 796 129 >> > > >-- >Eugene

Reply via email to