[GitHub] [airflow] potiuk commented on a diff in pull request #33213: Implement `is_authorized()` in auth manager

via GitHub Mon, 28 Aug 2023 16:28:22 -0700


potiuk commented on code in PR #33213:
URL: https://github.com/apache/airflow/pull/33213#discussion_r1308050422



##########
airflow/auth/managers/models/resource_details.py:
##########
@@ -0,0 +1,31 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+from dataclasses import dataclass
+
+
+@dataclass
+class ResourceDetails:
+    """
+    Represents the details of a resource.
+
+    All fields must be optional. These details can be used in authorization 
decision.
+    """
+
+    id: str | None = None

Review Comment:
   I am not insisting - if you will make a decision to make one "is_authorized" 
method because you think it will be better, I  have no big issue with it - but 
I would like to warn a bit from taking FAB as the "model" for resources. I 
think we want to break out from the model introduced by FAB because it was 
"bad". 
   
   If you want to map the resources 1-1, yes you will have a lot of them. But I 
think we only have so many resources, because of FAB declarative method of 
defining the access via configuraiton. What we are planning to do is to 
implement AuthManager to define authentication as Code. it's very similar to 
what you'd compare Oozie and Airflow Workflows. If you want to convert oozie 
workflows to Airflow you'd end up with strange looking Airflow DAG, which you 
could much easier express in a Code in Airflow. 
   
   The FAB "resource" definition has been traditionally very fine-grained 
because every permission in FAB was "declaratively" defined by assigning 
permission to role. This was really the only way to achieve it so if you wanted 
to decide if particular "sub-resource" is accessible or not in a fine grained 
way - you had no choice but to define fine-grained resources. And it was one of 
the FAB problems that it was too fine grained in definitiion. The only way you 
could give/not give access to particular view in FAB was that you combine 
"ACTION/METHOD" - ADD/DEL etc. with "RESOURCE" and either give or reject the 
access. 
   
   For example if you wanted to decide to give/not give access to "DAG Code" 
you had to give access to RESOURCE_DAG_CODE. There was no other way. But. in 
our case we can define much more COARSE grained resources and decide in the 
Auth Manager implementation, whether it should be further fine-grained or not.
   
   This is for example why we only have access "per DAG" as resource - because 
we had  no way to express it differeintly. You could not express a "group of 
DAGs" or "permission per tag" for that very reason because FAB model was 
"fixed" on resources and you could not express "Give permissions to all DAGS 
that have tag=x". 
   
   But we can do soooo much better with AuthManagerAsCode.
   
   We can programmatically decide individually what you do in each case - you 
do not have to have separate resource for every single action you do in them - 
you can make the decision internally in the implementaiton of the Auth manager.
   
   For example instead of "DAG Code" resource have `auth_DAG` method. And this 
method can have both DagDetails and "accessing" field indicating which 
particular "part" of the DAG to access:
   
   ```python
   def auth_dag(dag_details: DagDetails, task_details: TaskDetails. | None, 
accessing_action: DagAccessAction, method: Method):
       ....
   ```
   
   Dag details:
   
   ```
   DagDetails:
       id: 
       tags:
       queue/tenant:
       ....
   ```
   
   Accessing Action: 
   
   ```
   DagAccessAction(Enum):
        code,
        dependencies.
        run,
        task_instance,
        import_error.
   ```
   
   ```
   MethodMethod):
       read,
       write,
       edit,
       delete
   ```
   
   Now - our implementation might decide, based on whatever it wants -whether 
to allow or not specific action. It can be as simple as:
   
   ```
   def auth_dag(dag_details: DagDetails, task_details: TaskDetails. | None, 
accessing_action: DagAccessAction, method: Method):
        if dag_details.queue == 'x':
            return True # Allow all DAG/TASK operations for DAG where queue is X
   ```
   
   Or as complex as FAB -  mapping the above to the original "resources" and 
looking up the matrix of permission in the configuration and have access or not 
to particular tasks beloning to DAG based on whatever criteria.
   
   Assumption here is that the permission set usually will be simple - a user 
can acces all dags that belong to the "queue" - if he belongs to admin group, 
"read only" if he is regular user, none if he is not there. 
   
   I believe it is much simpler. I think if we do it this way, then implementor 
of each of the Auth Manager will make a decision on how complex, fine grained 
and configurable it is. If AWS Auth Manager want to **just** define 3 levels of 
users (Admin/ReadAll/None) and use "queue" to determine the group where those 
users are looked up - they will be able to do it. 
   
   
   
   > My understanding is the number of resource type will not change, hence 
from 
[permissions.py](https://github.com/apache/airflow/blob/main/airflow/security/permissions.py#L19),
 there are 40 resources types and as such, we should have 40 methods? This 
looks really massive.
   
   Yes if you map 1-1. But we should IMHO group access logically and leave the 
decision on "fine-graining" acces to the implementor.
   
   Also a lot of those resources are not needed (user/admin/role goes away)
   
   Here is my breakdown (very quick). I split the FAB resources into logical 
resources that could be used as "Auth manager" entrypoints - each entrypoint 
will have corresponding parameters determinining - if needed - not only the id 
of the resource but also action we want to do and method used.
   
   User/Admin permissions  - not needed in the new model
   
   ```
   RESOURCE_ACTION = "Permissions"
   RESOURCE_ADMIN_MENU = "Admin"
   RESOURCE_AIRFLOW = "Airflow"
   RESOURCE_MY_PASSWORD = "My Password"
   RESOURCE_MY_PROFILE = "My Profile"
   RESOURCE_PASSWORD = "Passwords"
   RESOURCE_PERMISSION = "Permission Views"  # Refers to a Perm <-> View 
mapping, not an MVC View.
   RESOURCE_LOGIN = "Logins"
   RESOURCE_ROLE = "Roles"
   RESOURCE_USER = "Users"
   RESOURCE_USER_STATS_CHART = "User Stats Chart"
   ```
   
   Not needed because it can be implicit from other (connection/config etc).
   ```
   RESOURCE_BROWSE = "Browse"
   RESOURCE_RESOURCE = "View Menus"
   ```
   
   We likely not need it:
   
   ```
   RESOURCE_DOCS = "Documentation"
   RESOURCE_DOCS_MENU = "Docs"
   ```
   
   
   
   1. Pools:
   
   ```
   RESOURCE_POOL = "Pools"
   ```
   
   2. SLA:
   
   ```
   RESOURCE_SLA_MISS = "SLA Misses"
   ```
   
   3. Jobs:
   
   ```
   RESOURCE_JOB = "Jobs"
   ```
   
   4. Audit logs:
   
   ```
   RESOURCE_AUDIT_LOG = "Audit Logs"
   ```
   
   5. Configuration:
   
   ```
   RESOURCE_CONFIG = "Configurations"
   ```
   
   6. Connections:
   
   ```
   RESOURCE_CONNECTION = "Connections"
   ```
   
   7. Variables:
   
   ```
   RESOURCE_VARIABLE = "Variables"
   ```
   
   9. XComs:
   
   ```
   RESOURCE_XCOM = "Xooms"
   ```
   
   10. View (read-only state of the installation):
   
   ```
   RESOURCE_PLUGIN = "Plugins"
   RESOURCE_PROVIDER = "Providers"
   RESOURCE_TRIGGER = "Triggers"
   RESOURCE_WEBSITE = "Website"
   ```
   
   11. Dags:
   
   ```
   RESOURCE_DAG = "DAGs"
   RESOURCE_DAG_CODE = "DAG Code"
   RESOURCE_DAG_DEPENDENCIES = "DAG Dependencies"
   RESOURCE_DAG_RUN = "DAG Runs"
   RESOURCE_DAG_WARNING = "DAG Warnings"
   RESOURCE_TASK_INSTANCE = "Task Instances"
   RESOURCE_TASK_LOG = "Task Logs"
   RESOURCE_TASK_RESCHEDULE = "Task Reschedules"
   RESOURCE_IMPORT_ERROR = "ImportError"
   ```
   
   12. Cluster activity:
   
   ```
   RESOURCE_CLUSTER_ACTIVITY = "Cluster Activity"
   ```
   
   13. Dataset
   
   ```
   RESOURCE_DATASET = "Datasets"
   ```
   
   I can meet and discuss more tomorrow if needed, I just wanted to make sure 
that we do not "stick" to the current - IMHO broken model of FAB but we come up 
with something much more reasonable.
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] potiuk commented on a diff in pull request #33213: Implement `is_authorized()` in auth manager

Reply via email to