betodealmeida opened a new pull request #16052:
URL: https://github.com/apache/superset/pull/16052


   ### SUMMARY
   <!--- Describe the change below, including rationale and design decisions -->
   
   This PR solves a bug caused by a number of small errors that separate are 
harmless:
   
   Bug 1. When running `load-examples` the datasets created from configs 
(https://github.com/apache/superset/tree/master/superset/examples/configs/datasets/examples)
 were loaded with schema set to null. The bug was fixed in 
https://github.com/apache/superset/pull/16041, but there are many datasets in 
the wild with `NULL` as their schema.
   
   Bug 2. When following the [dashboard creation 
tutorial](https://superset.apache.org/docs/creating-charts-dashboards/first-dashboard)
 the user is instructed to create a dataset called `cleaned_sales_data` in the 
`public` schema. This results in **two** datasets with the same name:
   
   1. `[NULL].cleaned_sales_data` with UUID A (added by `load-examples`)
   2. `public.cleaned_sales_data` with UUID B (added by the user)
   
   This **shouldn't be possible**, because `tables` had a uniqueness constraint 
of `database_id, table_name` at the time 
(https://github.com/apache/superset/blob/a786373fffe1a5dc5e2419505f982b14dcc09305/superset/connectors/sqla/models.py#L486).
 But the logic was enforced by the application, not the DB.
   
   If the user now runs `load-examples` again (with the fix from 
https://github.com/apache/superset/pull/16041) we'll try to import the dataset 
`public.cleaned_sales_data` with UUID A. The helper `import_from_dict` will 
then run a query similar to this to check for uniqueness:
   
   ```sql
   SELECT * FROM tables WHERE (name='cleaned_sales_data' AND `schema`='public') 
OR uuid='A'
   ```
   
   And this returns both datasets.
   
   A solution would be to delete all the datasets that have schema set to 
`NULL` (assuming they should have one), and then run `load-examples` again. But 
this would overwrite any custom datasets created by users.
   
   Instead, I changed the `load-examples` script to skip an import when 
duplicates are found. This should affect a small number of datasets, and since 
we have made no changes to the `cleaned_sales_data` dataset it's fine to skip 
it.
   
   ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
   <!--- Skip this if not applicable -->
   
   N/A
   
   ### TESTING INSTRUCTIONS
   <!--- Required! What steps can be taken to manually verify the changes? -->
   
   To replicate:
   
   1. Run `load-examples` with a SHA before 
https://github.com/apache/superset/pull/16041 was merged.
   2. Add a dataset called `cleaned_sales_data` in a given schema.
   3. Check that there are 2 datasets called `cleaned_sales_data`, one with 
schema and another without.
   4. Upgrade to post-https://github.com/apache/superset/pull/16041.
   5. Run `load-examples` again, it should work.
   
   ### ADDITIONAL INFORMATION
   <!--- Check any relevant boxes with "x" -->
   <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue -->
   - [ ] Has associated issue:
   - [ ] Changes UI
   - [ ] Includes DB Migration (follow approval process in 
[SIP-59](https://github.com/apache/superset/issues/13351))
     - [ ] Migration is atomic, supports rollback & is backwards-compatible
     - [ ] Confirm DB migration upgrade and downgrade tested
     - [ ] Runtime estimates and downtime expectations provided
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to