Just for clarity - correction to the last paragraph - <if added by the User, the package is added only to "worker/scheduler">
J. On Sat, Jun 19, 2021 at 1:03 AM Jarek Potiuk <[email protected]> wrote: > > Hey Subash, Jon > > > Plugins, providers, and their associated Python libraries all need to > > execute code in order to be installed which is a vulnerability. Plugins in > > particular are often developed/installed by the data engineers and not by > > system administrators, leading us back to our original problem. > > @John - it's You who introduced the User/Admin separation and > reasoning. I think you should follow the logical consequence of it and > introduce different level of access for those two types of users to > manage the platform to address it. You can control who has access to > install things and where. You are managing the access control to be > able to reconfigure the MWAA already and I am sure you do not give > casual "users" the ability to control certain aspects of the platform. > I am sure you could restrict the ability to install packages on > Webserver to only admins and have it open also for users in the > "scheduler/worker". Is that not possible? It sounds like what you > really need from your description. > > > @Jarek - you are right about the use/admin difference, it’s a > > disambiguation that permeates beyond the airflow UI layer in MWAA - IAM > > auth is used for determining authN and AuthZ, hence to secure the webserver > > from un-authorized code, we would have to either a/ treat plugin updates as > > an elevated permission activity, or b/ separate out the webserver intended > > requirements/plugins from the ones required for DAGs so that the authZ can > > be handled separately. > > Correct. This is exactly what I propose. Have a separate > "providers/plugins' install which only admins can update. Any package > added by "Admin" is added to both webserver is added to both - > webserver and worker/shcheduler. If you want dag-only packages that > are needed by "Users" they can be only added to workers. Sounds pretty > straightforward. > > > We stayed with the one-DAG-bad ideology to not add complexity to customers > > and coaching them on "if you add to A it goes here, and if B it goes to > > webserver". That’s is why we are now between rock and a hard place - not > > being to open up all installs into webserver OR separate the DAG bag for > > webserver and other entities. > > No. This is different. It's not "what" you install but "who" installs > it. I just follow the distinction introduced by Josh - if your > corporate customers have two distinct types of users, "Admins" and > "Users", I think you should follow this and introduce those two > different types of users. When a package is added by Admin user, it > should be added to both - webserver and worker/scheduler. If it is > added by the "User" - then it is added only to 'webserver/scheduler". > Then if the admins (I guess those are the ones who need to configure > connections anyway) - if they need a "connection type", they could add > the right provider themselves. Users will not be able to add them. > That completely solves the problem that Josh mentioned, I believe. > Please correct me if I am wrong. > > > On 6/18/21, 1:36 PM, "Jarek Potiuk" <[email protected]> wrote: > > > > CAUTION: This email originated from outside of the organization. Do not > > click links or open attachments unless you can confirm the sender and know > > the content is safe. > > > > > > > > > That would certainly help a bit, but unfortunately it's not just the > > packages. It's the fact that authentication is tied to Python code that > > can be patched by anyone with permission to execute code on the web server, > > which in turn would give them access to packages or any anything else > > they'd like. > > > > But in Airflow 2.0 the code provided by "DAG writers" is not executed > > any more. This is entirely gone together with Airflow 1.10. This has > > been handled by DAG serialization, which is the only option available > > in 2.0. I do not see how the "Users" could add any code if "Admins" > > control the packages that are installed in the webserver. Now if > > Admin/User is the only problem then I think this is really > > misunderstanding coming from the pre-DAG-serialization world of Apache > > Airflow. > > > > J. > > > > > -- > +48 660 796 129 -- +48 660 796 129
