A question for AWS & GCP folks, how would a user be able to use some custom dependencies (pip packages) in their setups, especially for the previous Airflow versions, where the Airflow webserver would need all the packages. If there is a way to solve that, then the same can be used for providers as well?
On Tue, Jun 15, 2021 at 7:46 PM Ash Berlin-Taylor <[email protected]> wrote: > Hi Subash, > > If your concern is about licensing then you have a false sense of > compliance in 1.10 -- the dependencies for the "providers" between 1.10 > and 2.0 haven't really changed -- the same (L)GPL, Facebook etc licensed > modules are still there in 1.10 and the 2.0 providers. > > My same question to you: can iterate (private if necessary) what the > security concerns here are? > > -ash > > > On Mon, Jun 14 2021 at 19:59:25 -0000, Subash Canapathy < > [email protected]> wrote: > > Hi Jarek Thank you for surfacing this issue on a discussion. The major > hurdle for managed services apart from the security constraints is on the > licensing side. Previously when the code needed for connection templates > was part of Airflow, we were able to bundle them as a solution as the code > was under the Apache v2 license. Now that we have them separated out as > provider packages, those come with dependencies that do not have "blessed" > licenses that allow bundling them into managed service. I am sure GCP folks > have similar restrictions on why they cannot add all 60+ providers as is > into the base image. We recently did the manual exercise to assess each of > those provider package and their dependencies, and only 20 of them made the > cut for not having to use additional licenses like Facebook license, LGPL > etc. Thanks Subash Canapathy On 2021/06/14 16:28:46, Ash Berlin-Taylor < > [email protected]> wrote: > > Can you elaborate (privately if you have to) on what the security concerns > are? Since as I understand it the web server is powery deployment, so > anything should be limited to one customer/user/deployment. There is also > the new "test connection" feature that will need the provider code > installed to work. Then there's the issue of third party connections - of > which there is only going to be more of over time. -ash On 14 June 2021 > 16:35:42 BST, Eugen Kosteev <[email protected]> wrote: >Hi Jarek. > > >Thanks for the discussion. >The issue with Connections management in the > web server that you described >is indeed affected Cloud Composer in the > released preview image versions of >Airflow 2.0.1 (link to public issue > > https://issuetracker.google.com/issues/190189297). And as you stated, we > do >not install pypi packages in web server image mostly because of > security >concerns. > >As a temporary workaround we baked all connections > (list of them with their >widgets pickled and stored inside) into a web > server image, so that >customers can add/edit them (even though not all > providers packages are >pre-installed). This is a temporary workaround that > we came up with for now >and we are looking for a long-term solution. > > >Our thoughts/ideas for alternative solutions: >1. We do not want to > pre-install all providers packages as to not generate >unnecessary python > dependencies. Or maybe we could do this only for web >server images (not > scheduler/worker) but then it is not clear if this is a >good idea to have > such occured discrepancy between pypi dependencies in web >server vs > scheduler/worker images. >2. Downloading and backing in providers packages > (wheel files) into docker >image and installing customer specific/required > version on demand looks >infeasible, taking into account number of > providers, their versions and >their dependencies. > >- Eugene > >On Sun, > Jun 13, 2021 at 6:46 PM Jarek Potiuk <[email protected]> wrote: > >> Dear > Airflow community, >> >> Here is another result of discussions. I would > like to raise an attention >> to potential Connection management problems > that might affect managed >> services for Airflow 2.0 and some providers. > >> >> With Airflow 2.0, connection UI "customisations" are baked into the > >> provider package and in order to see - for example Postgres connection > in >> the UI, you need to have the "postgres" provider installed in the > Webserver. >> >> As far as I know some of the Managed Airflow services > (MWAA, Composer, >> possibly other) do not currently allow their users > installation of >> additional packages in the webserver (the webserver > container is different >> than the scheduler/worker). This makes it > impossible to configure/edit >> provider connections via UI (unless those > providers are pre-installed in >> the webserver image). >> >> While this is > understandable from security point of view to forbid "any'' >> package > installation, I think the official >> "apache-airlfow-providers-*" should > be allowlisted for those images and >> installed or otherwise made > available (for example via pre-installing all >> providers in the webserver > image if this is not possible from security >> point of view to rebuild the > image dynamically) >> >> I wonder what people (and especially the people > from MWAA, Composer team) >> think about it - do I get it right about the > security concerns? Any other >> comments? >> >> >> J. >> >> -- >> +48 660 > 796 129 >> > > >-- >Eugene > >
