villebro opened a new pull request #14638:
URL: https://github.com/apache/superset/pull/14638


   ### SUMMARY
   A recent PR #14547 introduced a performance regression causing dataset 
metadata fetching to become very slow for datasets with large numbers of 
columns. I originally thought the type regexes were the problem, but when 
researching the problem more closely it turns out that just referencing 
`self.table.database.db_engine_spec` in a `TableColumn` instance cost ~6ms on 
my local machine. Multiply that by 1000 columns ~= 6000 ms. To get around this 
I added memoization to the semi-expensive regex, but also added memoizing for 
`db_engine_spec` to avoid refetching the object through the SQL relationships.
   ### BEFORE #14547 (pre-regression)
   For the World Bank dataset (328 cols), fetching the data took ~150 ms before 
on my local machine + the 20ms redirect
   
![image](https://user-images.githubusercontent.com/33317356/118236149-665aef00-b49e-11eb-872f-7f4cbfc637d9.png)
   
   ### CURRENT (master)
   For the same dataset, retrieval of data now takes ~10s!
   
![image](https://user-images.githubusercontent.com/33317356/118236606-f00abc80-b49e-11eb-9bc7-ab907f8b9084.png)
   
   ### AFTER
   Retrieval now takes ~200ms, i.e. slightly longer than originally, but not 
considerably longer when taking into account the removed 20ms redirect.
   
![image](https://user-images.githubusercontent.com/33317356/118236817-38c27580-b49f-11eb-8d24-9a89077ac04a.png)
   
   ### TEST PLAN
   <!--- What steps should be taken to verify the changes -->
   
   ### ADDITIONAL INFORMATION
   <!--- Check any relevant boxes with "x" -->
   <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue -->
   - [ ] Has associated issue:
   - [ ] Changes UI
   - [ ] Includes DB Migration (follow approval process in 
[SIP-59](https://github.com/apache/superset/issues/13351))
     - [ ] Migration is atomic, supports rollback & is backwards-compatible
     - [ ] Confirm DB migration upgrade and downgrade tested
     - [ ] Runtime estimates and downtime expectations provided
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to