etr2460 opened a new pull request, #22929:
URL: https://github.com/apache/superset/pull/22929

   <!---
   Please write the PR title following the conventions at 
https://www.conventionalcommits.org/en/v1.0.0/
   Example:
   fix(dashboard): load charts correctly
   -->
   
   ### SUMMARY
   Hey folks, long time no see! 😬 
   
![image](https://user-images.githubusercontent.com/7409244/215923574-d68f971d-1a8c-468b-92f9-9f16a11d9b16.png)
   
   In a recent release, we noticed a significant regression in CSV downloads 
from SQL Lab, where people who used to download hundreds of thousands of rows 
of results within our 5 minute webserver timeout were no longer able to. After 
digging into datadog traces, I saw that gunicorn was killing the process it was 
stuck here:
   
![image](https://user-images.githubusercontent.com/7409244/215923746-697c5e1a-2c6e-4b89-8c8c-0349c59e38c5.png)
   
   Looking into the code, this is in the middle of a scary O(N^2) loop that 
iterates through every cell of the csv output (which as mentioned before could 
be on the order of millions of cells) and escapes string values if necessary: 
https://github.com/apache/superset/blob/master/superset/utils/csv.py#L42-L80
   
   So something changed in performance here, and after digging in further, I 
noticed that we upgraded pandas in 
https://github.com/apache/superset/pull/22217. Since in every loop we update a 
value in a dataframe in pandas, I took a trip over to the pandas repo and found 
this issue reporting a performance regression in 1.5.1 (we upgraded to 1.5.2) 
where someone was also updating a dataframe in a loop: 
https://github.com/pandas-dev/pandas/issues/49729. This issue was resolved with 
the fix backported to 1.5.3, thus this PR upgrades pandas to 1.5.3.
   
   I sure hope this works.
   
   ### TESTING INSTRUCTIONS
   CI
   
   to: @john-bodley @ktmud @villebro @EugeneTorap (not sure who to else to add 
on this honestly)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org
For additional commands, e-mail: notifications-h...@superset.apache.org

Reply via email to