mikebridge commented on PR #39859: URL: https://github.com/apache/superset/pull/39859#issuecomment-4399302610
Here's the mysql equivalent:
**1. Per-table size, row count, and which migration path each takes.** Note
that on InnoDB, ADD/DROP PRIMARY KEY rebuilds the clustered index, so all eight
tables undergo a full rebuild — not just the two that go through the explicit
`recreate="always"` path:
```sql
SELECT
TABLE_NAME AS table_name,
CASE WHEN TABLE_NAME IN ('dashboard_slices', 'report_schedule_user')
THEN 'recreate (explicit, drops UNIQUE)'
ELSE 'direct ALTER (still rebuilds InnoDB clustered index)'
END AS migration_path,
TABLE_ROWS AS
estimated_rows,
CONCAT(ROUND((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024, 1), ' MB') AS
total_size,
CONCAT(ROUND(DATA_LENGTH / 1024 / 1024, 1), ' MB') AS
heap_size,
CONCAT(ROUND(INDEX_LENGTH / 1024 / 1024, 1), ' MB') AS
index_size
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = DATABASE()
AND TABLE_NAME IN (
'dashboard_roles', 'dashboard_slices', 'dashboard_user',
'report_schedule_user', 'rls_filter_roles', 'rls_filter_tables',
'slice_user', 'sqlatable_user'
)
ORDER BY (DATA_LENGTH + INDEX_LENGTH) DESC;
```
2. Aggregated duplicate-row roll-up. dup_groups is the number of (fk1,
fk2) pairs that appear more than once; rows_dropped is the total number of rows
the migration will delete during the
dedupe pass (it keeps MIN(id) per group):
```sql
SELECT 'dashboard_roles' AS t, COUNT(*) AS dup_groups, SUM(c) -
COUNT(*) AS rows_dropped
FROM (SELECT COUNT(*) c FROM dashboard_roles GROUP BY dashboard_id,
role_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'dashboard_slices', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM dashboard_slices GROUP BY dashboard_id,
slice_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'dashboard_user', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM dashboard_user GROUP BY user_id,
dashboard_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'report_schedule_user',COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM report_schedule_user GROUP BY user_id,
report_schedule_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'rls_filter_roles', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM rls_filter_roles GROUP BY role_id,
rls_filter_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'rls_filter_tables', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM rls_filter_tables GROUP BY table_id,
rls_filter_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'slice_user', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM slice_user GROUP BY user_id,
slice_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'sqlatable_user', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM sqlatable_user GROUP BY user_id,
table_id HAVING COUNT(*) > 1) g
ORDER BY rows_dropped DESC;
```
3. External-FK pre-flight check. The migration runs the equivalent at
upgrade time and aborts if anything is found. Should return zero rows on a
stock Superset install:
```sql
SELECT
CONSTRAINT_NAME,
CONCAT(TABLE_SCHEMA, '.', TABLE_NAME) AS referencing_table,
COLUMN_NAME AS referencing_column,
REFERENCED_TABLE_NAME AS referenced_table,
REFERENCED_COLUMN_NAME AS referenced_column
FROM information_schema.KEY_COLUMN_USAGE
WHERE TABLE_SCHEMA = DATABASE()
AND REFERENCED_TABLE_NAME IN (
'dashboard_roles', 'dashboard_slices', 'dashboard_user',
'report_schedule_user', 'rls_filter_roles', 'rls_filter_tables',
'slice_user', 'sqlatable_user'
)
AND REFERENCED_COLUMN_NAME = 'id';
```
4. Lock-window estimate for all eight tables. ADD PRIMARY KEY is INPLACE
but not LOCK=NONE — concurrent reads OK, writes blocked. Combine heap_size_mb
with your effective rebuild
throughput (~100–200 MB/s on commodity SSD; higher on NVMe) to size the
maintenance window per table:
```sql
SELECT
TABLE_NAME AS table_name,
CONCAT(ROUND(DATA_LENGTH / 1024 / 1024, 1), ' MB') AS heap_size,
ROUND(DATA_LENGTH / 1024 / 1024, 1) AS heap_size_mb,
ROUND(DATA_LENGTH / 1024 / 1024 / 100.0, 1) AS
est_rewrite_seconds_at_100mbs
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = DATABASE()
AND TABLE_NAME IN (
'dashboard_roles', 'dashboard_slices', 'dashboard_user',
'report_schedule_user', 'rls_filter_roles', 'rls_filter_tables',
'slice_user', 'sqlatable_user'
)
ORDER BY DATA_LENGTH DESC;
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
