Marcelo Vanzin created SPARK-20657:
--------------------------------------
Summary: Speed up Stage page
Key: SPARK-20657
URL: https://issues.apache.org/jira/browse/SPARK-20657
Project: Spark
Issue Type: Sub-task
Components: Web UI
Affects Versions: 2.3.0
Reporter: Marcelo Vanzin
The Stage page in the UI is very slow when a large number of tasks exist (tens
of thousands). The new work being done in SPARK-18085 makes that worse, since
it adds potential disk access to the mix.
A lot of the slowness is because the code loads all the tasks in memory then
sorts a really large list, and does a lot of calculations on all the data; both
can be avoided with the new app state store by having smarter indices (so data
is read from the store sorted in the desired order) and by keeping statistics
about metrics pre-calculated (instead of re-doing that on every page access).
Then only the tasks on the current page (100 items by default) need to actually
be loaded. This also saves a lot on memory usage, no just CPU time.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]