Re: [PR] refactor(db): composite PK on M2M association tables (sc-105349) [superset]

via GitHub Thu, 07 May 2026 09:39:45 -0700


mikebridge commented on PR #39859:
URL: https://github.com/apache/superset/pull/39859#issuecomment-4399084108


   This should help to assess the impact on postgres:
   
   ```sql
     -- =====================================================================   
                                                                                
                                 
     -- 1. Row counts and on-disk size per affected table                       
                     
     --    Tells us how much work the migration will do per table.              
                                                                                
                                
     --    The two tables marked WITH UNIQUE go through the slower              
                                                                                
                                 
     --    "full table rewrite" path; the other six are direct ALTER.           
                       
     -- =====================================================================   
                                                                                
                                 
                                                                                
          
     WITH affected(name, has_unique) AS (                                       
                                                                                
                                 
       VALUES                                                                   
                  
         ('dashboard_roles',       false),                                      
                                                                                
                                 
         ('dashboard_slices',      true),   -- full rewrite                 
         ('dashboard_user',        false),                                      
                                                                                
                                 
         ('report_schedule_user',  true),   -- full rewrite                     
                                                                                
                                 
         ('rls_filter_roles',      false),                                      
                                                                                
                                 
         ('rls_filter_tables',     false),                                      
                                                                                
                                 
         ('slice_user',            false),                                      
                                                                                
                                 
         ('sqlatable_user',        false)                                       
                  
     )                                                                          
                                                                                
                                 
     SELECT                                                                     
                                                                                
                                 
       a.name                                                AS table_name, 
       CASE WHEN a.has_unique THEN 'recreate (full rewrite)'                    
                                                                                
                                 
            ELSE 'direct ALTER' END                          AS migration_path, 
          
       c.reltuples::bigint                                   AS estimated_rows, 
                                                                                
                                 
       pg_size_pretty(pg_total_relation_size(c.oid))         AS total_size, 
       pg_size_pretty(pg_relation_size(c.oid))               AS heap_size,      
                                                                                
                                 
       pg_size_pretty(pg_indexes_size(c.oid))                AS index_size      
                  
     FROM affected a                                                            
                                                                                
                                 
     JOIN pg_class c ON c.relname = a.name AND c.relkind = 'r'                  
                  
     ORDER BY pg_total_relation_size(c.oid) DESC;                               
                                                                                
                                 
   ```
   
   ... and these will give some more details on some of the other data 
integrity issues:
   
   ```sql                                                                       
                                                                                
                                       
     -- =====================================================================   
                                                                                
                                 
     -- 2. Exact duplicate-row counts per table.                                
                                                                                
                                 
     --    The migration deletes duplicates (keeping MIN(id) per group).        
                                                                                
                                 
     --    If any of these counts are nonzero, audit-sensitive operators        
                                                                                
                                 
     --    should dump the affected rows BEFORE applying the migration.         
                                                                                
                                 
     -- =====================================================================   
                  
                                                                                
                                                                                
                                 
     SELECT 'dashboard_roles'      AS t, COUNT(*) AS dup_groups, SUM(c) - 
COUNT(*) AS rows_dropped
       FROM (SELECT COUNT(*) c FROM dashboard_roles      GROUP BY dashboard_id, 
role_id            HAVING COUNT(*) > 1) g                                       
                                 
     UNION ALL SELECT 'dashboard_slices',    COUNT(*), SUM(c) - COUNT(*)        
                                                                                
                                 
       FROM (SELECT COUNT(*) c FROM dashboard_slices     GROUP BY dashboard_id, 
slice_id           HAVING COUNT(*) > 1) g
     UNION ALL SELECT 'dashboard_user',      COUNT(*), SUM(c) - COUNT(*)        
                                                                                
                                 
       FROM (SELECT COUNT(*) c FROM dashboard_user       GROUP BY user_id, 
dashboard_id            HAVING COUNT(*) > 1) g                                  
                                      
     UNION ALL SELECT 'report_schedule_user',COUNT(*), SUM(c) - COUNT(*)        
                                                                                
                                 
       FROM (SELECT COUNT(*) c FROM report_schedule_user GROUP BY user_id, 
report_schedule_id      HAVING COUNT(*) > 1) g                                  
                                      
     UNION ALL SELECT 'rls_filter_roles',    COUNT(*), SUM(c) - COUNT(*)        
                                                                                
                                 
       FROM (SELECT COUNT(*) c FROM rls_filter_roles     GROUP BY role_id, 
rls_filter_id           HAVING COUNT(*) > 1) g
     UNION ALL SELECT 'rls_filter_tables',   COUNT(*), SUM(c) - COUNT(*)        
                                                                                
                                 
       FROM (SELECT COUNT(*) c FROM rls_filter_tables    GROUP BY table_id, 
rls_filter_id          HAVING COUNT(*) > 1) g                                   
                                     
     UNION ALL SELECT 'slice_user',          COUNT(*), SUM(c) - COUNT(*)        
                                                                                
                                 
       FROM (SELECT COUNT(*) c FROM slice_user           GROUP BY user_id, 
slice_id                HAVING COUNT(*) > 1) g                                  
                                      
     UNION ALL SELECT 'sqlatable_user',      COUNT(*), SUM(c) - COUNT(*)        
                                                                                
                                 
       FROM (SELECT COUNT(*) c FROM sqlatable_user       GROUP BY user_id, 
table_id                HAVING COUNT(*) > 1) g                                  
                                      
     ORDER BY rows_dropped DESC NULLS LAST;                                     
                                                                                
                                 
   ```
   
   ```sql                                                                       
                                                                                
                                       
     -- =====================================================================   
                                                                                
                                 
     -- 3. NULL-FK row counts.                                                  
                  
     --    The migration deletes any row with NULL in either FK column          
                                                                                
                                 
     --    (since PK columns must be NOT NULL). Should normally be zero;        
                  
     --    nonzero means application bugs or manual SQL produced bad rows.      
                                                                                
                                 
     -- =====================================================================   
                                                                                
                                 
                                                                                
                                                                                
                                 
     SELECT 'dashboard_roles'       AS t, COUNT(*) FILTER (WHERE dashboard_id 
IS NULL OR role_id IS NULL)             AS null_fk_rows FROM dashboard_roles    
                                   
     UNION ALL SELECT 'dashboard_slices',     COUNT(*) FILTER (WHERE 
dashboard_id IS NULL OR slice_id IS NULL)        FROM dashboard_slices          
                                            
     UNION ALL SELECT 'dashboard_user',       COUNT(*) FILTER (WHERE user_id IS 
NULL OR dashboard_id IS NULL)         FROM dashboard_user                       
                                 
     UNION ALL SELECT 'report_schedule_user', COUNT(*) FILTER (WHERE user_id IS 
NULL OR report_schedule_id IS NULL)   FROM report_schedule_user
     UNION ALL SELECT 'rls_filter_roles',     COUNT(*) FILTER (WHERE role_id IS 
NULL OR rls_filter_id IS NULL)        FROM rls_filter_roles                     
                                 
     UNION ALL SELECT 'rls_filter_tables',    COUNT(*) FILTER (WHERE table_id 
IS NULL OR rls_filter_id IS NULL)       FROM rls_filter_tables                  
                                   
     UNION ALL SELECT 'slice_user',           COUNT(*) FILTER (WHERE user_id IS 
NULL OR slice_id IS NULL)             FROM slice_user                           
                                 
     UNION ALL SELECT 'sqlatable_user',       COUNT(*) FILTER (WHERE user_id IS 
NULL OR table_id IS NULL)             FROM sqlatable_user                       
                                 
     ORDER BY null_fk_rows DESC;                                                
                                                                                
                                 
   ```
   
   If this returns any rows it'd _really_ be a bad thing:
   
   ```sql                                                                       
                        
     -- =====================================================================   
                                                                                
                                 
     -- 4. External FK references to the soon-to-be-removed `id` columns.       
                                                                                
                                 
     --    The migration runs this same check as a pre-flight assertion and     
                                                                                
                                 
     --    aborts if anything is found. Run it ahead of time so you know        
                                                                                
                                 
     --    what (if anything) needs to be migrated/dropped first. On a          
                                                                                
                                 
     --    standard Superset deployment this should return zero rows.           
                                                                                
                                 
     --    (Default schema only; multi-schema deployments need to broaden.)     
                                                                                
                                 
     -- =====================================================================   
                                                                                
                                 
                                                                                
                                                                                
                                 
     SELECT                                                                     
                                                                                
                                 
       rc.constraint_name,                                                      
                                                                                
                                 
       kcu.table_schema || '.' || kcu.table_name AS referencing_table,          
                                                                                
                                 
       kcu.column_name                           AS referencing_column,         
                                                                                
                                 
       ccu.table_name                            AS referenced_table,           
                                                                                
                                 
       ccu.column_name                           AS referenced_column           
                                                                                
                                 
     FROM information_schema.referential_constraints rc                         
                                                                                
                                 
     JOIN information_schema.key_column_usage      kcu                          
                                                                                
                                 
       ON kcu.constraint_name = rc.constraint_name                              
                                                                                
                                 
      AND kcu.constraint_schema = rc.constraint_schema                      
     JOIN information_schema.constraint_column_usage ccu                        
                                                                                
                                 
       ON ccu.constraint_name = rc.constraint_name                
      AND ccu.constraint_schema = rc.constraint_schema                          
                                                                                
                                 
     WHERE ccu.table_name IN (                                                  
                  
             'dashboard_roles','dashboard_slices','dashboard_user',             
                                                                                
                                 
             'report_schedule_user','rls_filter_roles','rls_filter_tables', 
             'slice_user','sqlatable_user')                                     
                                                                                
                                 
       AND ccu.column_name = 'id';                                              
                                                                                
                                 
   ```
   
   ```sql                                                                       
                
     -- =====================================================================   
                                                                                
                                 
     -- 5. Lock-window sizing for the recreate path on the two heaviest         
                                                                                
                                 
     --    tables. The "full table rewrite" path takes an ACCESS EXCLUSIVE      
                  
     --    lock on the table for the duration. Estimate the rewrite time        
                                                                                
                                 
     --    by combining heap size with your hardware's sequential write         
                                                                                
                                 
     --    rate (≈100–200 MB/s on commodity SSD; faster on NVMe).               
                  
     -- =====================================================================   
                                                                                
                                 
                                                                                
                  
     SELECT                                                                     
                                                                                
                                 
       c.relname                                       AS table_name,           
                                                                                
                                 
       pg_size_pretty(pg_relation_size(c.oid))         AS heap_size,            
                                                                                
                                 
       pg_relation_size(c.oid) / 1024 / 1024           AS heap_size_mb,         
                                                                                
                                 
       -- conservative estimate: 100 MB/s effective rewrite throughput          
                                                                                
                                 
       ROUND(pg_relation_size(c.oid) / 1024 / 1024 / 100.0, 1) AS 
est_rewrite_seconds_at_100mbs                                                   
                                               
     FROM pg_class c                                                            
                                                                                
                                 
     WHERE c.relname IN ('dashboard_slices', 'report_schedule_user');      
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] refactor(db): composite PK on M2M association tables (sc-105349) [superset]

Reply via email to