justinpark opened a new pull request, #32608:
URL: https://github.com/apache/superset/pull/32608

   ### SUMMARY
   Currently, the `recent_activity` query for the log table groups by the 
dashboard ID and slice ID to extract a distinct list from the entire log table, 
which leads to performance issues.
   (In the case of Airbnb, more than 1 million logs are generated each day, and 
grouping by dashboard and slice ID, even with indexing, significantly impacts 
database performance as shown in the following log) 
   ```
   | 3793593 | superset         | 100.117.121.20:41904  | superset_production | 
Query       |  1726 s| Sending data                                             
     | SELECT anon_1.dashboard_id AS anon_1_dashboard_id, anon_1.slice_id AS 
anon_1_slice_id, anon_1.action AS anon_1_action, anon_1.dttm AS anon_1_dttm, 
dashboards.slug AS dashboard_slug, dashboards.dashboard_title AS 
dashboards_dashboard_title, slices.slice_name AS slices_slice_name
   FROM (SELECT logs.dashboard_id AS dashboard_id, logs.slice_id AS slice_id, 
logs.action AS action, max(logs.dttm) AS dttm
   FROM logs
   WHERE logs.action IN ('explore', 'dashboard') AND logs.user_id = 13295 AND 
logs.dttm > '2024-03-06 18:00:12.201653' AND (logs.dashboard_id IS NOT NULL OR 
logs.slice_id IS NOT NULL) GROUP BY logs.dashboard_id, logs.slice_id, 
logs.action) AS anon_1 LEFT OUTER JOIN dashboards ON dashboards.id = 
anon_1.dashboard_id LEFT OUTER JOIN slices ON slices.id = anon_1.slice_id
   WHERE dashboards.dashboard_title != '' OR slices.slice_name != '' ORDER BY 
anon_1.dttm DESC
    LIMIT 0, 6 |
   ```
   
   To resolve this issue, it would be appropriate to create a materialized 
activity statistics view through a daily/hourly batch job. However, 
functionally, the main purpose of recent activity is to display only a few of 
the most recently visited items. Therefore, we improved performance by changing 
the approach to fetch the latest (including duplicates) log entries (by 
`distinct: false`) and extract a distinct list using LRU cache in the frontend 
side.
   
   ### TESTING INSTRUCTIONS
   specs
   
   ### ADDITIONAL INFORMATION
   <!--- Check any relevant boxes with "x" -->
   <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue -->
   - [ ] Has associated issue:
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [ ] Includes DB Migration (follow approval process in 
[SIP-59](https://github.com/apache/superset/issues/13351))
     - [ ] Migration is atomic, supports rollback & is backwards-compatible
     - [ ] Confirm DB migration upgrade and downgrade tested
     - [ ] Runtime estimates and downtime expectations provided
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org
For additional commands, e-mail: notifications-h...@superset.apache.org

Reply via email to