john-bodley opened a new pull request #5449: URL: https://github.com/apache/superset/pull/5449
We've noticed a number of anomalies in our database caused by ill-defined forms and/or table schema definitions. This PR resolves a number of issues related to the `tables` table including: - Modifies the uniqueness logic defined by PR https://github.com/apache/incubator-superset/pull/3978 (which was never created in a migration) to include the schema name (previously it was only based on the database/table). This is now handled by a SQLAlchemy event listener as many dialects permit multiple NULL values for columns than can contain NULL. - Removes the pre-add for checking for uniqueness. This is no longer required as the uniqueness constraint defines the same logic. This also only worked when adding a table and thus it was possible to have duplicates names if someone edited the record. - Adds form validation to ensure that the schema name is not part of the table name. - Ensures the `table_name` column is non-nullable.  Note this migration will fail if the `table_name` column is NULL. One must manually fix these records as programmatically trying to remedy these invalid records is difficult as the intent is unclear and the tables may function (from a query standpoint) if SQL is provided. The following query determines which records are problematic: ``` SELECT * FROM tables WHERE table_name IS NULL ``` Finally this migration _will_ fail if the `tables` table is corrupt in terms of the uniqueness. One **must** manually consolidate duplicate or non-valid records given there's no way of programmatically removing invalid records. The following query determines whether there are duplicates: ``` SELECT database_id, `schema`, table_name, COUNT(1) FROM tables GROUP BY 1, 2, 3 HAVING COUNT(1) > 1 ORDER BY COUNT(1) DESC ``` Note this PR is gated by https://github.com/apache/incubator-superset/pull/5445 and https://github.com/apache/incubator-superset/pull/7084 which ensure that empty strings associated with form-data wont persist in the database and is necessary for ensuring that the relevant entries are non-NULL. to: @fabianmenges @graceguo-supercat @michellethomas @mistercrunch @timifasubaa -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
