[ https://issues.apache.org/jira/browse/SPARK-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marcelo Vanzin updated SPARK-20657: ----------------------------------- Target Version/s: 2.3.0 I'm targeting 2.3.0 since the stages page is really slow for large apps without this fix. The fix also changes the data saved to the disk store, so adding this to a later release would require revving the store version and re-parsing event logs, so better to avoid that. The patch has been up for review for a while but nobody has looked at it yet. > Speed up Stage page > ------------------- > > Key: SPARK-20657 > URL: https://issues.apache.org/jira/browse/SPARK-20657 > Project: Spark > Issue Type: Sub-task > Components: Web UI > Affects Versions: 2.3.0 > Reporter: Marcelo Vanzin > > The Stage page in the UI is very slow when a large number of tasks exist > (tens of thousands). The new work being done in SPARK-18085 makes that worse, > since it adds potential disk access to the mix. > A lot of the slowness is because the code loads all the tasks in memory then > sorts a really large list, and does a lot of calculations on all the data; > both can be avoided with the new app state store by having smarter indices > (so data is read from the store sorted in the desired order) and by keeping > statistics about metrics pre-calculated (instead of re-doing that on every > page access). > Then only the tasks on the current page (100 items by default) need to > actually be loaded. This also saves a lot on memory usage, not just CPU time. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org