mikebridge commented on PR #39859: URL: https://github.com/apache/superset/pull/39859#issuecomment-4399084108
This should help to assess the impact on postgres:
```sql
-- =====================================================================
-- 1. Row counts and on-disk size per affected table
-- Tells us how much work the migration will do per table.
-- The two tables marked WITH UNIQUE go through the slower
-- "full table rewrite" path; the other six are direct ALTER.
-- =====================================================================
WITH affected(name, has_unique) AS (
VALUES
('dashboard_roles', false),
('dashboard_slices', true), -- full rewrite
('dashboard_user', false),
('report_schedule_user', true), -- full rewrite
('rls_filter_roles', false),
('rls_filter_tables', false),
('slice_user', false),
('sqlatable_user', false)
)
SELECT
a.name AS table_name,
CASE WHEN a.has_unique THEN 'recreate (full rewrite)'
ELSE 'direct ALTER' END AS migration_path,
c.reltuples::bigint AS estimated_rows,
pg_size_pretty(pg_total_relation_size(c.oid)) AS total_size,
pg_size_pretty(pg_relation_size(c.oid)) AS heap_size,
pg_size_pretty(pg_indexes_size(c.oid)) AS index_size
FROM affected a
JOIN pg_class c ON c.relname = a.name AND c.relkind = 'r'
ORDER BY pg_total_relation_size(c.oid) DESC;
```
... and these will give some more details on some of the other data
integrity issues:
```sql
-- =====================================================================
-- 2. Exact duplicate-row counts per table.
-- The migration deletes duplicates (keeping MIN(id) per group).
-- If any of these counts are nonzero, audit-sensitive operators
-- should dump the affected rows BEFORE applying the migration.
-- =====================================================================
SELECT 'dashboard_roles' AS t, COUNT(*) AS dup_groups, SUM(c) -
COUNT(*) AS rows_dropped
FROM (SELECT COUNT(*) c FROM dashboard_roles GROUP BY dashboard_id,
role_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'dashboard_slices', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM dashboard_slices GROUP BY dashboard_id,
slice_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'dashboard_user', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM dashboard_user GROUP BY user_id,
dashboard_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'report_schedule_user',COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM report_schedule_user GROUP BY user_id,
report_schedule_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'rls_filter_roles', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM rls_filter_roles GROUP BY role_id,
rls_filter_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'rls_filter_tables', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM rls_filter_tables GROUP BY table_id,
rls_filter_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'slice_user', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM slice_user GROUP BY user_id,
slice_id HAVING COUNT(*) > 1) g
UNION ALL SELECT 'sqlatable_user', COUNT(*), SUM(c) - COUNT(*)
FROM (SELECT COUNT(*) c FROM sqlatable_user GROUP BY user_id,
table_id HAVING COUNT(*) > 1) g
ORDER BY rows_dropped DESC NULLS LAST;
```
```sql
-- =====================================================================
-- 3. NULL-FK row counts.
-- The migration deletes any row with NULL in either FK column
-- (since PK columns must be NOT NULL). Should normally be zero;
-- nonzero means application bugs or manual SQL produced bad rows.
-- =====================================================================
SELECT 'dashboard_roles' AS t, COUNT(*) FILTER (WHERE dashboard_id
IS NULL OR role_id IS NULL) AS null_fk_rows FROM dashboard_roles
UNION ALL SELECT 'dashboard_slices', COUNT(*) FILTER (WHERE
dashboard_id IS NULL OR slice_id IS NULL) FROM dashboard_slices
UNION ALL SELECT 'dashboard_user', COUNT(*) FILTER (WHERE user_id IS
NULL OR dashboard_id IS NULL) FROM dashboard_user
UNION ALL SELECT 'report_schedule_user', COUNT(*) FILTER (WHERE user_id IS
NULL OR report_schedule_id IS NULL) FROM report_schedule_user
UNION ALL SELECT 'rls_filter_roles', COUNT(*) FILTER (WHERE role_id IS
NULL OR rls_filter_id IS NULL) FROM rls_filter_roles
UNION ALL SELECT 'rls_filter_tables', COUNT(*) FILTER (WHERE table_id
IS NULL OR rls_filter_id IS NULL) FROM rls_filter_tables
UNION ALL SELECT 'slice_user', COUNT(*) FILTER (WHERE user_id IS
NULL OR slice_id IS NULL) FROM slice_user
UNION ALL SELECT 'sqlatable_user', COUNT(*) FILTER (WHERE user_id IS
NULL OR table_id IS NULL) FROM sqlatable_user
ORDER BY null_fk_rows DESC;
```
If this returns any rows it'd _really_ be a bad thing:
```sql
-- =====================================================================
-- 4. External FK references to the soon-to-be-removed `id` columns.
-- The migration runs this same check as a pre-flight assertion and
-- aborts if anything is found. Run it ahead of time so you know
-- what (if anything) needs to be migrated/dropped first. On a
-- standard Superset deployment this should return zero rows.
-- (Default schema only; multi-schema deployments need to broaden.)
-- =====================================================================
SELECT
rc.constraint_name,
kcu.table_schema || '.' || kcu.table_name AS referencing_table,
kcu.column_name AS referencing_column,
ccu.table_name AS referenced_table,
ccu.column_name AS referenced_column
FROM information_schema.referential_constraints rc
JOIN information_schema.key_column_usage kcu
ON kcu.constraint_name = rc.constraint_name
AND kcu.constraint_schema = rc.constraint_schema
JOIN information_schema.constraint_column_usage ccu
ON ccu.constraint_name = rc.constraint_name
AND ccu.constraint_schema = rc.constraint_schema
WHERE ccu.table_name IN (
'dashboard_roles','dashboard_slices','dashboard_user',
'report_schedule_user','rls_filter_roles','rls_filter_tables',
'slice_user','sqlatable_user')
AND ccu.column_name = 'id';
```
```sql
-- =====================================================================
-- 5. Lock-window sizing for the recreate path on the two heaviest
-- tables. The "full table rewrite" path takes an ACCESS EXCLUSIVE
-- lock on the table for the duration. Estimate the rewrite time
-- by combining heap size with your hardware's sequential write
-- rate (≈100–200 MB/s on commodity SSD; faster on NVMe).
-- =====================================================================
SELECT
c.relname AS table_name,
pg_size_pretty(pg_relation_size(c.oid)) AS heap_size,
pg_relation_size(c.oid) / 1024 / 1024 AS heap_size_mb,
-- conservative estimate: 100 MB/s effective rewrite throughput
ROUND(pg_relation_size(c.oid) / 1024 / 1024 / 100.0, 1) AS
est_rewrite_seconds_at_100mbs
FROM pg_class c
WHERE c.relname IN ('dashboard_slices', 'report_schedule_user');
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
