Re: [PR] refactor(db): composite PK on M2M association tables (sc-105349) [superset]

via GitHub Thu, 07 May 2026 10:08:59 -0700


mikebridge commented on PR #39859:
URL: https://github.com/apache/superset/pull/39859#issuecomment-4399302610


   Here's the mysql equivalent:
   
   **1. Per-table size, row count, and which migration path each takes.** Note 
that on InnoDB, ADD/DROP PRIMARY KEY rebuilds the clustered index, so all eight 
tables undergo a full rebuild —  not just the two that go through the explicit 
`recreate="always"` path:                                                       
                                                              
                                                                                
                                                                                
                                 
     ```sql                                                                     
                                                                                
                                 
     SELECT                                                                     
                                                                                
                                 
       TABLE_NAME AS table_name,                                                
                        
       CASE WHEN TABLE_NAME IN ('dashboard_slices', 'report_schedule_user')     
                                                                                
                                 
            THEN 'recreate (explicit, drops UNIQUE)'
            ELSE 'direct ALTER (still rebuilds InnoDB clustered index)'         
                                                                                
                                 
       END AS migration_path,                                                   
          
       TABLE_ROWS                                                  AS 
estimated_rows,                                                                 
                                           
       CONCAT(ROUND((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024, 1), ' MB') AS 
total_size,         
       CONCAT(ROUND(DATA_LENGTH / 1024 / 1024, 1), ' MB')          AS 
heap_size,                  
       CONCAT(ROUND(INDEX_LENGTH / 1024 / 1024, 1), ' MB')         AS 
index_size                  
     FROM information_schema.TABLES                                             
          
     WHERE TABLE_SCHEMA = DATABASE()                                            
                                                                                
                                 
       AND TABLE_NAME IN (                                                      
                                                                                
                                 
         'dashboard_roles', 'dashboard_slices', 'dashboard_user',               
                                                                                
                                 
         'report_schedule_user', 'rls_filter_roles', 'rls_filter_tables',       
                                                                                
                                 
         'slice_user', 'sqlatable_user'                                         
                                                                                
                                 
       )                                                                        
                                                                                
                                 
     ORDER BY (DATA_LENGTH + INDEX_LENGTH) DESC;                                
                                                                                
                                 
   ```
                                                                                
                                                                                
                                 
     2. Aggregated duplicate-row roll-up. dup_groups is the number of (fk1, 
fk2) pairs that appear more than once; rows_dropped is the total number of rows 
the migration will delete during the 
     dedupe pass (it keeps MIN(id) per group):                                  
                                                                                
                                 
   
   ```sql                                                                       
  
     SELECT 'dashboard_roles'      AS t, COUNT(*) AS dup_groups, SUM(c) - 
COUNT(*) AS rows_dropped                                                        
                                       
       FROM (SELECT COUNT(*) c FROM dashboard_roles      GROUP BY dashboard_id, 
role_id            HAVING COUNT(*) > 1) g
     UNION ALL SELECT 'dashboard_slices',    COUNT(*), SUM(c) - COUNT(*)        
                                                                                
                                 
       FROM (SELECT COUNT(*) c FROM dashboard_slices     GROUP BY dashboard_id, 
slice_id           HAVING COUNT(*) > 1) g
     UNION ALL SELECT 'dashboard_user',      COUNT(*), SUM(c) - COUNT(*)        
                                                                                
                                 
       FROM (SELECT COUNT(*) c FROM dashboard_user       GROUP BY user_id, 
dashboard_id            HAVING COUNT(*) > 1) g                                  
                                      
     UNION ALL SELECT 'report_schedule_user',COUNT(*), SUM(c) - COUNT(*)        
                                                                                
                                 
       FROM (SELECT COUNT(*) c FROM report_schedule_user GROUP BY user_id, 
report_schedule_id      HAVING COUNT(*) > 1) g                                  
                                      
     UNION ALL SELECT 'rls_filter_roles',    COUNT(*), SUM(c) - COUNT(*)        
                  
       FROM (SELECT COUNT(*) c FROM rls_filter_roles     GROUP BY role_id, 
rls_filter_id           HAVING COUNT(*) > 1) g                                  
                                      
     UNION ALL SELECT 'rls_filter_tables',   COUNT(*), SUM(c) - COUNT(*)        
                                                                                
                                 
       FROM (SELECT COUNT(*) c FROM rls_filter_tables    GROUP BY table_id, 
rls_filter_id          HAVING COUNT(*) > 1) g                                   
                                     
     UNION ALL SELECT 'slice_user',          COUNT(*), SUM(c) - COUNT(*)        
                                                                                
                                 
       FROM (SELECT COUNT(*) c FROM slice_user           GROUP BY user_id, 
slice_id                HAVING COUNT(*) > 1) g                                  
                                      
     UNION ALL SELECT 'sqlatable_user',      COUNT(*), SUM(c) - COUNT(*)        
                                                                                
                                 
       FROM (SELECT COUNT(*) c FROM sqlatable_user       GROUP BY user_id, 
table_id                HAVING COUNT(*) > 1) g                                  
                                      
     ORDER BY rows_dropped DESC;                                                
                                                                                
                                 
   ```
                                                                                
                                                                                
                                 
     3. External-FK pre-flight check. The migration runs the equivalent at 
upgrade time and aborts if anything is found. Should return zero rows on a 
stock Superset install:                    
   
   ```sql                                                                       
                        
     SELECT                                                                     
                                                                                
                                 
       CONSTRAINT_NAME,                                                     
       CONCAT(TABLE_SCHEMA, '.', TABLE_NAME)        AS referencing_table,       
                                                                                
                                 
       COLUMN_NAME                                  AS referencing_column,      
                                                                                
                                 
       REFERENCED_TABLE_NAME                        AS referenced_table,        
                                                                                
                                 
       REFERENCED_COLUMN_NAME                       AS referenced_column        
                                                                                
                                 
     FROM information_schema.KEY_COLUMN_USAGE                                   
                                                                                
                                 
     WHERE TABLE_SCHEMA = DATABASE()                                            
                  
       AND REFERENCED_TABLE_NAME IN (                                           
                                                                                
                                 
         'dashboard_roles', 'dashboard_slices', 'dashboard_user',               
                                                                                
                                 
         'report_schedule_user', 'rls_filter_roles', 'rls_filter_tables',       
                  
         'slice_user', 'sqlatable_user'                                         
                                                                                
                                 
       )                                                                        
                                                                                
                                 
       AND REFERENCED_COLUMN_NAME = 'id';                                       
                                                                                
                                 
   ```
                                                                                
                                                                                
                                 
     4. Lock-window estimate for all eight tables. ADD PRIMARY KEY is INPLACE 
but not LOCK=NONE — concurrent reads OK, writes blocked. Combine heap_size_mb 
with your effective rebuild
     throughput (~100–200 MB/s on commodity SSD; higher on NVMe) to size the 
maintenance window per table:                                                   
                                    
   
   ```sql                                                                       
  
     SELECT                                                                     
                                                                                
                                 
       TABLE_NAME                                            AS table_name,     
                  
       CONCAT(ROUND(DATA_LENGTH / 1024 / 1024, 1), ' MB')    AS heap_size,      
                                                                                
                                 
       ROUND(DATA_LENGTH / 1024 / 1024, 1)                   AS heap_size_mb,
       ROUND(DATA_LENGTH / 1024 / 1024 / 100.0, 1)           AS 
est_rewrite_seconds_at_100mbs                                                   
                                                 
     FROM information_schema.TABLES                                             
                                                                                
                                 
     WHERE TABLE_SCHEMA = DATABASE()                                            
                                                                                
                                 
       AND TABLE_NAME IN (                                                      
                                                                                
                                 
         'dashboard_roles', 'dashboard_slices', 'dashboard_user',               
                  
         'report_schedule_user', 'rls_filter_roles', 'rls_filter_tables',       
                  
         'slice_user', 'sqlatable_user'                                         
          
       )                                                                        
                                                                                
                                 
     ORDER BY DATA_LENGTH DESC;
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] refactor(db): composite PK on M2M association tables (sc-105349) [superset]

Reply via email to