[GitHub] [superset] ktmud edited a comment on pull request #19421: perf: migrate new dataset models with INSERT FROM

GitBox Fri, 01 Apr 2022 16:14:39 -0700


ktmud edited a comment on pull request #19421:
URL: https://github.com/apache/superset/pull/19421#issuecomment-1086218951



   @eschutho I propose to change current migration to no-op and move my updated 
code to a new migration. 
   
   I DM'ed @betodealmeida and @hughhhh earlier on Slack. Reposting the messages 
here for visibility:
   
   ---
   
   Hi, I noticed we are making more adjustments to SIP-68 models and have 
prepared a [couple](https://github.com/apache/superset/pull/19425) of 
[more](https://github.com/apache/superset/pull/19487) db migrations. I’m 
wondering whether we should bundle all these migrations (including the first 
one that’s already merged) into one new migration and change the original 
migration to no-op.
   
   **Pros:**
   
   - Reduced total migration time: bundle everything should be faster than 
running them separately
   - We get a chance to fix a couple of more errors such as [using MediumText 
for MySQL](https://github.com/apache/superset/pull/19421#discussion_r839942807) 
and [incorrect additive_metric_types 
matching](https://github.com/apache/superset/pull/19421#discussion_r839903477)
   - We get a chance to copy over other missing data such as [changed on and 
last 
updated](https://github.com/apache/superset/pull/19421#discussion_r840089807)
   - We can re-ID the copied entities to follow the original ones, making it 
easier to spot-check potential data inconsistency bugs down the road
   - Everyone’s db is in a clean and consistent state
   - It's easier to review the db structure in the future
   
   **Cons:**
   - Those who already ran the migration and bore the slowness may have to 
experience it again
   
   Happy to incorporate 
[#19487](https://github.com/apache/superset/pull/19487/) and 
[#19425](https://github.com/apache/superset/pull/19425) to [my 
PR](https://github.com/apache/superset/pull/19421) if they are still needed.
   
   Btw, I think the `Dataset` model may need a `database_id` column as well. 
There is the implicit assumption that a dataset can only run on one database. I 
cannot imagine a case where we need to support a virtual dataset being used on 
different tables in different databases. Having direct link to databases makes 
sure existing virtual datasets can be linked to the correct database without 
relying on an unreliable table name extraction process. Currently if table name 
extraction fails, a virtual dataset lost its association with a correct table, 
hence the only link to database. It would require joining `SqlaTable` with 
`sqlatable_id` to get the correct database id. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org
For additional commands, e-mail: notifications-h...@superset.apache.org

[GitHub] [superset] ktmud edited a comment on pull request #19421: perf: migrate new dataset models with INSERT FROM

Reply via email to