mistercrunch opened a new issue, #28377: URL: https://github.com/apache/superset/issues/28377
# Motivation This SIP proposes a better security model for Superset, aimed at simplifying and strengthening the management of permissions across the platform. Our goal is to transition from the current model, which is heavily tied to the Flask App Builder (FAB) framework and its assumptions and limitations, to a more intuitive and scalable system. This new model will reduce complexity for administrators, align permissions with user expectations and the actual architecture of Superset, and enhance performance by streamlining permission checks. It will introduce a clearer, structured approach to defining resources and actions, incorporate a flexible, attribute-based access control (ABAC) system, and lay the groundwork for decoupling from FAB, thus paving the way for a more robust and extensible security framework. ## Looking back… First, let’s analyze the current security model and how it evolved. Using Flask AppBuilder (FAB) as a foundational framework, we inherited its dynamic security model, which automatically generates one permission for every view method (essentially any method that returns a web response), including those inherited from its base classes. In environments where FAB manages only a few models, this approach is manageable. However, in Superset, which supports a large number of models and custom methods, this model expands to hundreds of permissions, many of which do not meaningfully correspond to the mental model of users or administrators. In 2020, we undertook efforts to rationalize the number of permissions. @dpgaspar enhanced the **`BaseView`** in FAB to "map" a class name to a permission using **`class_permission_name`** and **`method_permission_name`**, allowing FAB users to associate any given view with a specific permission. We extensively applied this mapping across Superset, reducing the number of atomic permissions from about 500 to just around 100. This reduction was achieved through a series of PRs, such as [[this one](https://github.com/apache/superset/pull/12012)](https://github.com/apache/superset/pull/12012), and by using mapping logic coded [[here](https://github.com/apache/superset/blob/master/superset/constants.py#L122-L170)](https://github.com/apache/superset/blob/master/superset/constants.py#L122-L170). Note that these PRs were cautious in reassigning permissions to roles through database migrations. ## Issues with the Current Model - **Unconfined Sprawl**: Many permissions loosely fit our users' mental models or the application’s information architecture. - **Permission confusion:** so many issues on GitHub and question on Slack stem from the fact that our permission model is not comprehensive or documented. It’s also hard to document it as is since it’s not very intelligible. - **FAB Coupling**: Our tight coupling with Flask AppBuilder is increasingly problematic, as FAB's generic approach to permissions diverges from Superset's specific needs. Permissions tend to get tied to the technical implementation as opposed to our information architecture and use cases. - **Balance Issues**: We struggle to find the right balance between simplicity and flexibility. Some permissions are too granular, making them difficult to manage, while others are too broad, limiting our ability to grant precise access controls. Arguably the efforts in 2020 brought things from “too atomic” to “not atomic enough” in some areas. - **Performance**: The absence of pattern matching leads to frequent database queries to handle individual permissions from a long list. If rules are expressed as patterns they are much cheaper to store and evaluate against. ie: `read.*` so much more dense than listing out 100 read-related permissions. - **Poor UX**: The UI for managing roles is cumbersome, combining an overwhelming mix of permissions in an infinitely long list, often with names that are unclear even to the developers who implemented the underlying logic. ### Proposed Change # **Goals** This SIP aims to overhaul Superset’s security model to achieve the following key objectives: 1. **Establish a Sensible and Scalable Permission System**: Create a security framework where every permission adheres to a clear and logical pattern, easily understood and managed by users and administrators. This system should simplify permissions without sacrificing the granularity needed for precise access control. 2. **Prevent Permission Sprawl**: Implement a robust governance strategy that mandates attaching future methods to a well-defined permission-naming scheme. This will involve strict guidelines for developing new features or modifying existing ones, ensuring that they conform to the established permission architecture. 3. **Facilitate Decoupling from Flask AppBuilder (FAB)**: Develop strategies to reduce dependency on FAB, paving the way for a centralized policy enforcement mechanism that better aligns with Superset’s specific requirements. This decoupling will allow for greater flexibility and the adoption of more advanced security practices that are currently constrained by FAB’s architecture. 4. **Improve System Performance and User Experience**: Enhance the performance of the security model by optimizing how permissions are checked and reducing database query loads. Simultaneously, revamp the UI for role management to make it more intuitive and user-friendly, avoiding the pitfalls of the current system which mixes numerous types of permissions in a confusing and unwieldy manner. 5. **Enable Advanced Security Features**: Introduce advanced security capabilities such as attribute-based access control (ABAC) with a flexible, domain-specific language (DSL) that supports dynamic permission queries. This will allow for more sophisticated, context-sensitive security policies that can adapt to complex organizational needs. 6. **Provide a path forward:** for everyone migrating to the new model, we want to find a clear path that either guarantees backwards compabiltity and/or forces them to make decisions where required. # Resources, Actions and Subjects  Let’s move from the current approach that dynamically creates individual permission for each [view class_name] and [view.method_name] and towards a clearer, stricter set of clearly defined **resources** and **actions**. ## Resources We will define a finite set of resources that align with our information architecture and application structure. These resources represent logical groupings within Apache Superset, reflecting both the UI and underlying data structures. Here are the primary resources identified: **Core entities** - **Chart**: Individual visualizations within dashboards. - **Dashboard**: Collections of charts and visualizations. - **SavedQuery**: Stored SQL queries that can be reused. - **User**: Represents individual users with access to the system. - **Report**: Scheduled or triggered reports based on data within the system. - **Dataset:** - Embedded Dashboard: as a special kind of dashboard, some users might have access - … **Data access entities** - **Database**: The databases accessible within Superset. - **Catalog**: Groupings of schemas within a database. - **Schema**: Specific schemas within a database catalog. - **Relation** (table or view): The datasets that are queried directly by users. - **Row-Level Security (RLS)**: Applies fine-grained access control at the row level within datasets. ### **Resource ABAC Selectors - A Simple Yet Evolutive DSL** To effectively manage permissions across these resources, we introduce a simple yet powerful Domain-Specific Language (DSL) for defining attribute-based access controls (ABAC). This DSL allows administrators to specify and enforce security policies directly related to the attributes of resources. Here are the key features of our ABAC DSL: <aside> 💡 If anyone knows a good DSL for this, or a specific tool that nailed this, it’d be great to reuse a standard like RISON (but maybe not quite like it). Ideally the DSL has python/js parser. </aside> **Features:** - **Equality**: Specify direct equality for any resource attribute to match specific values. - Example: **`Dashboard.id.equal(1)`** — Grants access to the Dashboard with ID 1. - **Sets**: Define access based on membership in a set of values for any attribute. - Example: **`Dashboard.id.in(1, 2, 3)`** — Grants access to Dashboards with IDs 1, 2, or 3. - **Negative Flip**: Exclude specific values using a negation operator. - Example: **`!Dashboard.id.in(1, 2, 3)`** — Grants access to Dashboards except those with IDs 1, 2, or 3. - **Logical Operators**: Use **`and`**, **`or`** to combine multiple conditions, enhancing the flexibility to define complex policies. - Example: **`Dashboard.published.equal(true) and (User.role.in('Admin', 'Editor') or !Dashboard.confidential.equal(true))`** - This policy grants access to published Dashboards to users with roles 'Admin' or 'Editor', unless the Dashboard is confidential. **Set Operations:** - **Intersection** (`&`): Combines conditions where all must be true. - **Union** (`|`): Combines conditions where at least one must be true. - **Difference** (`-`): Specifies conditions that must not be true to grant access. This DSL is designed to be easy to use, read, and integrate into our existing systems while being robust enough to handle complex permission scenarios. The use of familiar logical operators and condition structures ensures that policies are both transparent and maintainable. ### About serving lists efficiently… Another important property of this DSL will be around the ability for it to be translated into SQLAlchemy `.where()` clauses, that can then be translated to SQL. The reason why this is important is because we often need to extract list of resources to a user, and we need the ability to execute these filtering clauses at the database level (as in “show me a list of 50 charts that this user has access to” and paginate through results). The expressions above need to be execute in SQLAlchemy and translated to SQL. ### **Limiting related attributes** While it’s easy to think about highly specific rules and selectors (as in `Dashboard.obscure_property.groups.json.Xor(...)` ) We’ll want to clearly state and limit which attributes get expose to the very few that are needed to craft rules. For instance while `id.in()` is powerful and important, allowing people to create ABAC rules based on obscure attributes that could be mutated over time seems hard to support. Fewer selector are better. Version 1 may just have `id.in()` for instance, and slowly evolve to support more attributes and operators. A few intricate-yet-relevant attributes include the concept of “ownership” (is the user one of the owners of the object), or a private vs published status. The former may require specific logic as we’re looking into a many-to-many relationship as it relates to a specific / active user. Maybe simple magic function like `Dashboard.@is_owner` as opposed to something like. `Dashboard.owners.id.includes(@current_user.id)` ### Data Access Resource Data Access-related resources in Superset represent the hierarchical and external nature of database elements accessible within the platform. These resources include **Databases**, **Catalogs**, **Schemas**, **Relations (tables and views), Columns** and **Rows**. Unlike other entities in Superset, the management of these resources is not governed by typical CRUD operations due to their external management and inherent structural dependencies. **Key Characteristics:** - **Externally Managed**: These resources reflect structures and permissions that are managed outside of Superset, typically by database administrators or through external database management systems. As such, Superset's role is limited to interfacing with these permissions rather than controlling their definitions. - **Query-only Interaction**: The primary interaction with these resources is through querying. Unlike other resources where full CRUD operations might be applicable, the actions available for Data Access Resources are generally restricted to "read" or "query" operations, aligning with the data governance policies set at the database or enterprise level. - **Hierarchical Nature**: The access control for these resources needs to respect their hierarchical structure. For instance, access granted at a database level might inherently imply access to its catalogs, schemas, and datasets, unless explicitly restricted. This hierarchical permission model must be carefully designed to ensure it does not inadvertently grant broader access than intended. To solve for this I’d like to bring in a special resource called `DataDomain` that would encompass all this under a single resource that can we used to specify a set of object, or domain as opposed to having 4-5 levels hierarchy of object. DataDomain becomes a special selector to target a set of tables, a schema, or a whole database. Many DataDomains can be combined in a permission `DataDomain(db=1, schema_match='core.*') ## **Actions** To ensure a clear and scalable security model, we introduce a hierarchical naming convention for actions using colon-separated words, which supports pattern matching and fine-grained access control. This structure allows us to define actions in a way that makes them intuitive and consistent across different parts of the system. ### **Hierarchical Structure** The action naming convention is designed to be hierarchical, facilitating both broad and precise permission settings. Actions are segmented into levels, allowing policy-makers to specify permissions at various granularities using wildcards. For example, a **`write:.*`** pattern would grant all write-related permissions on a resource, while more specific pattern like **`write:delete:.*`** would apply only to deletion operations (`write:delete:one`, `write:delete:bulk` , …) **Example Actions:** - **`write:delete:one`**: Applies to deleting a single item. - **`write:delete:bulk`**: Applies to bulk deletion operations. - **`write:update`**: Applies to update operations. - **`write:insert:one`**: Applies to inserting a single item. - **`write:insert:bulk`**: Applies to bulk insertion operations. - **`read:export:csv`**: Specific to exporting data in CSV format. ### **Structuring the Action Hierarchy** The hierarchy is constructed from a curated dictionary of terms that are clear and relevant to our operations. Each term is carefully chosen to ensure it aligns with common actions within Apache Superset, yet is flexible enough to accommodate unique operational requirements. The top levels of the hierarchy are **strict**, and opening up to reach higher atomicity and a higher cardinality of words in the higher levels of the hierarchy **Strict Top-Level Actions:** - **`read`**: For read-only operations that do not affect the state. - **`write`**: For creating, modifying, or deleting data. - `grant`: To give the right to grant permissions to others <aside> 💡 `grant` bring a fair amount of complexity and should probably be saved for a future iteration beyond v1. It’s still important to show how the model could evolve to support more advanced use cases </aside> Strict Second level, for `read` - `one`: To display data without changes. - **`list`**: To list data without modifications. - `export` **Strict second level for mutations under `write`:** - **`delete`**: For removal operations. - `put`:… - `post`: … **Other Commonly used verbs in actions:** - **`export`**: For exporting data out of the system. - **`import`**: For importing data into the system. - `bulk` **Flexible Terminology for Specific Actions:** For actions specific to certain functions or data formats, we use more flexible terminology as the last element of our action strings (e.g., **`csv`**, **`excel`**). This allows for the easy introduction of more atomic actions that target very specific features where required. **Pattern Matching:** While related attribute matching makes sense for Resources (as specified above) a simpler string matching approach should work for actions - **`.*`**: Allows matching any sequence of actions within a specified level. - **`!{pattern}`**: Excludes actions matching the specified pattern. - regex? This approach to defining actions enhances the flexibility and clarity of our security model. By using a structured hierarchy and clear terminology, we ensure that permissions are both manageable and transparent, allowing administrators to effectively control access across various parts of Apache Superset. Notes: - Note that `write` does not imply `read`, at it would be unclear if any write include all reads. While conceptually read may often be implied from write, this model forces you to specify which level of read and write you want to allow independently. Note that `.*` would imply all types of a reads an writes ## Subject In the Superset security model redesign, a "Subject" represents an entity directly associated with a user or actor interacting with the system. Subjects play a crucial role in determining access permissions and enforcing security policies. Here's a breakdown of key aspects related to Subjects: 1. **Special Resource Entity**: In the context of the security model, a Subject is treated as a special type of "Resource." While traditional resources represent data entities or system components, Subjects specifically pertain to users or actors within the system. 2. **Subject Selectors**: Similar to other resources, Subjects are associated with "subject selectors." These selectors define attributes or properties of the user entity and are used to determine access rights and permissions. For example, a subject selector might specify user roles, groups, or individual user identifiers. 3. **Association with Permissions**: Subjects are directly associated with permissions in Policy. When defining access controls, administrators specify which actions users (Subjects) are permitted to perform on specific resources. This association allows for granular control over user access and ensures that permissions are accurately enforced. ## Permission In this framework, a **`Permission`** is simply a combination of: - one or many `Resource` **patterns** (as in `Dashboard.uuid.in("43242ee")`), which [dynamically] can bet computed into an array of objects. - one or many `Action` **patterns** as in `write.*` - can optionally be named and referenced - can optionally have a longer **description** attached for convenience - NOTE: the sequencing (ordering) of the expression may matter and help define precedence of rule if/when ambiguous ```yaml # JSON SCHEMA Permission: type: object properties: name: type: string description: type: string resources: type: array items: type: string actions: type: array items: type: string ``` ```yaml # DRAFT of what a superset_policies.yml could look like permissions: admin_permissions: description: Can do EVERYTHING on all RESOURCES permissions: - resources: - .* actions: - write:.* - read:.* viewer_permissions: description: Can browse and view most non-system, published objects permissions: - resources: - Chart - Dashboard - Query - SavedQuery - ... actions: - read:.* # Can't see/view unpublished or private dashboards - resources: - Dashboard.published.equals(False) actions: - !read:.* creator_owner_permissions: description: | Can create new objects, and has full power on things they created/own. NOTE: making a Viewer person an owner wouldn't bypass the policy as it does today permissions: - resources: - Chart - Dashboard - Query - SavedQuery - ... actions: - write.post # so that they can CREATE new objects # Full power on resources that they own - resources: - .*[owns] actions: - write.* full_data_reader_permissions: permissions: - resources: - DataDomain:.* permisison: - read FinanceDataReader: permissions: - resources: - DataDomain:schema=finance permisison: - read FullMinusFinanceDataReader: permissions: - resources: - DataDomain:schema=finance permisison: - !read ``` ## Policy (combining resource, actions and subjects) Building upon permissions, a Policy is: - a named entity, with an optional description - a collection of permission - a collection of subjects selectors ```yaml # JSON schema Policy: type: object properties: name: type: string description: type: string permissions: type: array items: $ref: "#/Permission" subject_selectors: type: array items: type: string ``` ```yaml # DRAFT of what a superset_roles.yml could look like Roles: Admin: - policies: - Admin - groups: - ldap.sysops - users: - j...@preset.io Alpha: - policies: - CreatorOwner - Viewer - FullDataReader - groups: # maybe upon importing / syncing ldap group with custom logic, # we prefix those to be referenced in places like this roles.yml file - ldap.data_scientists - ldap.data_engineers - users: - j...@preset.io FinanceTeam: - policies: - CreatorOwner - Viewer - FinanceDataReader - groups: - ldap.finance_team BronzeUsers: - policies: - CreatorOwner - Viewer - FullMinusFinanceDataReader - groups: - ldap.finance_team ``` ## **Groups and Other User Attributes** In our ongoing efforts, we're introducing a new entity: Groups. These groups serve as straightforward collections of users, with membership becoming a pivotal element of subject selectors. As the framework evolves, we can extend support to more intricate subject selectors. For instance, we could implement selectors based on the domain of a user's email address. # Static and Dynamic Objects For some uses cases, many of the objects defined above, namely Permissions, Policies, Roles, and UserGroups are best defined fairly statically, as in shouldn’t really change from a deployment to the next. For these we may want to have them largely defined as code and cannot be altered at runtime. For other use cases, things are required to be more dynamic, to a point where it’s reasonable to think we want for Administrators in the UI to point-and-click and grant access to people to certain things. Those can be altered by users in real-time and should take effect immediately. That leaves us with a combination of static and dynamic objects that can be combined to in an environment. Some environments may choose to have few rules defined as code, while other may have a much more dynamic set of rules defined in the UI. ## Storage Ok, so some policies (static) are configs, and some policies are dynamic (stored in the database). In all cases, they are stored as collections of action and resources “patterns” strings. For consistency, we *could* push the statically-defined ones in the DB and mark them as read-only, but don’t have to (?) ## Sizing / caching / Performance In a complex environment, what would be the size of the whole “role/policy book”? Probably megabytes as most, which I believe could by / should be cached and stored in memory of all backend services. Meaning if someone updates a database-stored rules, we 1. update the db, but 2. expire the in-memory cache and force a refresh in all processes. Alternatively/complementarily the whole policy book could be stored an Redis. Now if in the user session we know their roles, looking up the roles against the in-memory policy book should be super cheap. In any case, policy and role membership should be fairly slowly changing and fit nicely in-memory cache. PolicyManager should be extremely fast at processing assertions. That should be a massive improvement form before where we did a lot of looking up specific perms in database against a fairly large list. # The Playbook At a high-level, and given the complexity of the project, we recommend bringing the new security model in parallel to the existing one, and running both in parallel for some time prior to deprecation of the old model. Why? - allows for comparison/validation checks for a period of transition - allows for smoother migrations in large/complex environment - avoid a gigantic release branch and all sorts of merge conflict - avoids the scary big bang approach, organizations can decide how/when to make the switch independently from upgrades ## 1. Inject Resource and Action semantics For every single view/method in Superset: - define a resource from a clearly identified list - define the action, following the scheme defined above - once everything is covered, prevent sprawl, and insure all new methods should force Resource+Action semantics ## 2. Introduce a PolicyManager - can answer booleans on “Can user perform this action on this resource?” - can provide/apply filters for resource list “give me a filtering criteria for Dashboards for this user based on her/his permissions” Sketching what SecurityManager could look like ```python class SecurityManager: def view_decorator(permissions: list[Permission]): """specify and enforce permission requirements for a view""" def check_permissions(permissions: list[Permission]) , user): -> bool """check that a user has a set of permissions""" def apply_resource_filter(query: sa.Query, user): -> sa.Query """Given a resource query, appends `.filter()` to limit to user's access rights""" def provide_constraints(user, resource_type, user): -> list[Contraints] """Given a resource type, returns the list of contrainst for a given user""" ``` We probably need some sort of `PolicyManagerViewMixin` or maybe it injects itself in `SupersetBaseView` , but some constructs to ensure permission semantics at a deep level, maybe it forces-map methods to have permission semantic, prevents unmapped method, and auto-assigns things based on convention. The decorator above is ideal because explicit, but there’s a lot of views we inherit from FAB and therefore need to map to action/resource after the fact. **Conflict resolution:** the policy manager knows how to bubble up conflictual rules and/or how to resolve/log conflicts. **Auditing/logging:** should be possible to log every assertion (user X asked for access to resources Y to perform action A). Probably need some sort of PolicyLogger hook ## 3. Introduce a feature flag Wherever permissions are checked, introduce logic for a feature flag to decided whether the old way or new way should be executed. **Maybe there’s a DEBUG mode where we execute both and look for differences, alert where disagreements are found** ## 4. Build a UI Here’s a google sheet containing an extraction of the current permissions in Superset→ https://docs.google.com/spreadsheets/d/13CQQX5MhhSH99ZsnHlSmSyZeMUcQ6TWhoLAuMI4hAGs/edit#gid=680477257 - A policy editor, where a policy is a collection of Permission (accumulator pattern) - Each Permission is a collection of - Resource selectors (accumulator of text patterns) - Actions patterns (either a tree navigator OR an accumulator of text patterns)  - A Role editor where a role is - Name - Description - A collection of policies - A collection of Groups (accumulator) - A collection of Users (accumulator) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org For additional commands, e-mail: notifications-h...@superset.apache.org