giftig opened a new issue, #25040: URL: https://github.com/apache/superset/issues/25040
On init, the security manager syncs role definitions for Admin, Alpha, Gamma, etc. by invoking `set_role` with each role, passing in a function which filters `PermissionView` instances and determines if they should be in that role. `set_role` then fetches all `PermissionView` instances from the database and invokes the filter function. The filter function checks a load of rules about viewmenu and permission name, so ultimately we need to look at `pvm.permission.name` and `pvm.view_menu.name` for every `PermissionView` we retrieved from the database: that's where this function is very inefficient. Because the relations with `Permission` and `ViewMenu` are lazy, it's executing another 2 SELECT queries each for each `pvm` to get those two names, which should have been eagerly loaded when getting all the instances. Additionally, since all `PermissionView` instances are fetched for every invocation of `set_role`, these queries are all run another 5x each. So the query count being executed by this sync task is approximately: ``` 5 * 2 * (pvm_count) ``` There's also a `ViewMenu` for every dataset, so `pvm_count` can easily be 10000+. As a result of that, we're seeing this task hammer the database and take upward of 20 mins to execute. If we update it to eagerly load the relations it's likely we could reduce this to a single query and make it ~100x faster. #### How to reproduce the bug 1. Set `SQLALCHEMY_ECHO = True` 2. Run the `sync_role_definitions` task, either from the flask shell or as part of the `superset init` process 3. Observe a large number of individual SELECT queries being performed. ### Expected results Should be faster and use fewer queries ### Actual results Slow, lots of queries. ### Environment (please complete the following information): - browser type and version: N/A - superset version: `2.1.0`, but also checked the function in `master` - python version: N/A - node.js version: `N/A` - any feature flags active: N/A ### Checklist Make sure to follow these steps before submitting your issue - thank you! - [x] I have checked the superset logs for python stacktraces and included it here as text if there are any. - [x] I have reproduced the issue with at least the latest released version of superset. - [x] I have checked the issue tracker for the same issue and I haven't found one similar. ### Additional context Here's a sample of some of the queries observed: ```sql 2023-08-21 13:13:04,272 INFO sqlalchemy.engine.Engine SELECT ab_view_menu.id AS ab_view_menu_id, ab_view_menu.name AS ab_view_menu_name FROM ab_view_menu WHERE ab_view_menu.id = %(pk_1)s ``` clearly showing it's looking up entries individually. I have a pretty good idea how to fix this so I will raise a PR if I get time, but it's quite a simple fix so perhaps someone who wants to contribute a small change to the project could take a look at is as well. The inefficiency doesn't have enormous impact because it happens during init, but it does nevertheless slow down the init process a lot when you have many datasets, so it would be good to fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
