justinpark opened a new pull request, #32608: URL: https://github.com/apache/superset/pull/32608
### SUMMARY Currently, the `recent_activity` query for the log table groups by the dashboard ID and slice ID to extract a distinct list from the entire log table, which leads to performance issues. (In the case of Airbnb, more than 1 million logs are generated each day, and grouping by dashboard and slice ID, even with indexing, significantly impacts database performance as shown in the following log) ``` | 3793593 | superset | 100.117.121.20:41904 | superset_production | Query | 1726 s| Sending data | SELECT anon_1.dashboard_id AS anon_1_dashboard_id, anon_1.slice_id AS anon_1_slice_id, anon_1.action AS anon_1_action, anon_1.dttm AS anon_1_dttm, dashboards.slug AS dashboard_slug, dashboards.dashboard_title AS dashboards_dashboard_title, slices.slice_name AS slices_slice_name FROM (SELECT logs.dashboard_id AS dashboard_id, logs.slice_id AS slice_id, logs.action AS action, max(logs.dttm) AS dttm FROM logs WHERE logs.action IN ('explore', 'dashboard') AND logs.user_id = 13295 AND logs.dttm > '2024-03-06 18:00:12.201653' AND (logs.dashboard_id IS NOT NULL OR logs.slice_id IS NOT NULL) GROUP BY logs.dashboard_id, logs.slice_id, logs.action) AS anon_1 LEFT OUTER JOIN dashboards ON dashboards.id = anon_1.dashboard_id LEFT OUTER JOIN slices ON slices.id = anon_1.slice_id WHERE dashboards.dashboard_title != '' OR slices.slice_name != '' ORDER BY anon_1.dttm DESC LIMIT 0, 6 | ``` To resolve this issue, it would be appropriate to create a materialized activity statistics view through a daily/hourly batch job. However, functionally, the main purpose of recent activity is to display only a few of the most recently visited items. Therefore, we improved performance by changing the approach to fetch the latest (including duplicates) log entries (by `distinct: false`) and extract a distinct list using LRU cache in the frontend side. ### TESTING INSTRUCTIONS specs ### ADDITIONAL INFORMATION <!--- Check any relevant boxes with "x" --> <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue --> - [ ] Has associated issue: - [ ] Required feature flags: - [ ] Changes UI - [ ] Includes DB Migration (follow approval process in [SIP-59](https://github.com/apache/superset/issues/13351)) - [ ] Migration is atomic, supports rollback & is backwards-compatible - [ ] Confirm DB migration upgrade and downgrade tested - [ ] Runtime estimates and downtime expectations provided - [ ] Introduces new feature or API - [ ] Removes existing feature or API -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org For additional commands, e-mail: notifications-h...@superset.apache.org