giftig opened a new issue, #25040:
URL: https://github.com/apache/superset/issues/25040

   On init, the security manager syncs role definitions for Admin, Alpha, 
Gamma, etc. by invoking `set_role` with each role, passing in a function which 
filters `PermissionView` instances and determines if they should be in that 
role. `set_role` then fetches all `PermissionView` instances from the database 
and invokes the filter function.
   
   The filter function checks a load of rules about viewmenu and permission 
name, so ultimately we need to look at `pvm.permission.name` and 
`pvm.view_menu.name` for every `PermissionView` we retrieved from the database: 
that's where this function is very inefficient. Because the relations with 
`Permission` and `ViewMenu` are lazy, it's executing another 2 SELECT queries 
each for each `pvm` to get those two names, which should have been eagerly 
loaded when getting all the instances. Additionally, since all `PermissionView` 
instances are fetched for every invocation of `set_role`, these queries are all 
run another 5x each.
   
   So the query count being executed by this sync task is approximately:
   
   ```
   5 * 2 * (pvm_count)
   ```
   
   There's also a `ViewMenu` for every dataset, so `pvm_count` can easily be 
10000+.
   
   As a result of that, we're seeing this task hammer the database and take 
upward of 20 mins to execute. If we update it to eagerly load the relations 
it's likely we could reduce this to a single query and make it ~100x faster.
   
   #### How to reproduce the bug
   
   1. Set `SQLALCHEMY_ECHO = True`
   2. Run the `sync_role_definitions` task, either from the flask shell or as 
part of the `superset init` process
   3. Observe a large number of individual SELECT queries being performed.
   
   ### Expected results
   
   Should be faster and use fewer queries
   
   ### Actual results
   
   Slow, lots of queries.
   
   
   ### Environment
   
   (please complete the following information):
   
   - browser type and version: N/A
   - superset version: `2.1.0`, but also checked the function in `master`
   - python version: N/A
   - node.js version: `N/A`
   - any feature flags active: N/A
   
   ### Checklist
   
   Make sure to follow these steps before submitting your issue - thank you!
   
   - [x] I have checked the superset logs for python stacktraces and included 
it here as text if there are any.
   - [x] I have reproduced the issue with at least the latest released version 
of superset.
   - [x] I have checked the issue tracker for the same issue and I haven't 
found one similar.
   
   ### Additional context
   
   Here's a sample of some of the queries observed:
   
   ```sql
   2023-08-21 13:13:04,272 INFO sqlalchemy.engine.Engine SELECT ab_view_menu.id 
AS ab_view_menu_id, ab_view_menu.name AS ab_view_menu_name 
   FROM ab_view_menu 
   WHERE ab_view_menu.id = %(pk_1)s
   ```
   clearly showing it's looking up entries individually.
   
   I have a pretty good idea how to fix this so I will raise a PR if I get 
time, but it's quite a simple fix so perhaps someone who wants to contribute a 
small change to the project could take a look at is as well. The inefficiency 
doesn't have enormous impact because it happens during init, but it does 
nevertheless slow down the init process a lot when you have many datasets, so 
it would be good to fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to