betodealmeida opened a new pull request #16052: URL: https://github.com/apache/superset/pull/16052
### SUMMARY <!--- Describe the change below, including rationale and design decisions --> This PR solves a bug caused by a number of small errors that separate are harmless: Bug 1. When running `load-examples` the datasets created from configs (https://github.com/apache/superset/tree/master/superset/examples/configs/datasets/examples) were loaded with schema set to null. The bug was fixed in https://github.com/apache/superset/pull/16041, but there are many datasets in the wild with `NULL` as their schema. Bug 2. When following the [dashboard creation tutorial](https://superset.apache.org/docs/creating-charts-dashboards/first-dashboard) the user is instructed to create a dataset called `cleaned_sales_data` in the `public` schema. This results in **two** datasets with the same name: 1. `[NULL].cleaned_sales_data` with UUID A (added by `load-examples`) 2. `public.cleaned_sales_data` with UUID B (added by the user) This **shouldn't be possible**, because `tables` had a uniqueness constraint of `database_id, table_name` at the time (https://github.com/apache/superset/blob/a786373fffe1a5dc5e2419505f982b14dcc09305/superset/connectors/sqla/models.py#L486). But the logic was enforced by the application, not the DB. If the user now runs `load-examples` again (with the fix from https://github.com/apache/superset/pull/16041) we'll try to import the dataset `public.cleaned_sales_data` with UUID A. The helper `import_from_dict` will then run a query similar to this to check for uniqueness: ```sql SELECT * FROM tables WHERE (name='cleaned_sales_data' AND `schema`='public') OR uuid='A' ``` And this returns both datasets. A solution would be to delete all the datasets that have schema set to `NULL` (assuming they should have one), and then run `load-examples` again. But this would overwrite any custom datasets created by users. Instead, I changed the `load-examples` script to skip an import when duplicates are found. This should affect a small number of datasets, and since we have made no changes to the `cleaned_sales_data` dataset it's fine to skip it. ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF <!--- Skip this if not applicable --> N/A ### TESTING INSTRUCTIONS <!--- Required! What steps can be taken to manually verify the changes? --> To replicate: 1. Run `load-examples` with a SHA before https://github.com/apache/superset/pull/16041 was merged. 2. Add a dataset called `cleaned_sales_data` in a given schema. 3. Check that there are 2 datasets called `cleaned_sales_data`, one with schema and another without. 4. Upgrade to post-https://github.com/apache/superset/pull/16041. 5. Run `load-examples` again, it should work. ### ADDITIONAL INFORMATION <!--- Check any relevant boxes with "x" --> <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue --> - [ ] Has associated issue: - [ ] Changes UI - [ ] Includes DB Migration (follow approval process in [SIP-59](https://github.com/apache/superset/issues/13351)) - [ ] Migration is atomic, supports rollback & is backwards-compatible - [ ] Confirm DB migration upgrade and downgrade tested - [ ] Runtime estimates and downtime expectations provided - [ ] Introduces new feature or API - [ ] Removes existing feature or API -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
