Marcelo Vanzin created SPARK-20657:
--------------------------------------

             Summary: Speed up Stage page
                 Key: SPARK-20657
                 URL: https://issues.apache.org/jira/browse/SPARK-20657
             Project: Spark
          Issue Type: Sub-task
          Components: Web UI
    Affects Versions: 2.3.0
            Reporter: Marcelo Vanzin


The Stage page in the UI is very slow when a large number of tasks exist (tens 
of thousands). The new work being done in SPARK-18085 makes that worse, since 
it adds potential disk access to the mix.

A lot of the slowness is because the code loads all the tasks in memory then 
sorts a really large list, and does a lot of calculations on all the data; both 
can be avoided with the new app state store by having smarter indices (so data 
is read from the store sorted in the desired order) and by keeping statistics 
about metrics pre-calculated (instead of re-doing that on every page access).

Then only the tasks on the current page (100 items by default) need to actually 
be loaded. This also saves a lot on memory usage, no just CPU time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to