[
https://issues.apache.org/jira/browse/SPARK-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087235#comment-14087235
]
Carlos Fuertes commented on SPARK-2016:
---------------------------------------
I have done some very simple benchmarks comparing the current master and the UI
is still unresponsive with big tables (high number of blocks) even after the
change in SPARK-2316. However if you switch to a solution where you serve the
data for the tables through JSON and build the html table with Javascript, the
UI remains responsive.
Here it is a rough benchmark running on an old MacBook laptop in local mode and
using Chrome to render the UI — gathered the stats using the dev tools included
in Chrome:
> sc.parallelize(1 to 1000000, 50000).count()
The time to load ‘/storage/rdd/?id=0’ is :
- Current master release takes between ~11 secs but then when the page finishes
loading is completely unusable since it takes forever to scroll up or down.
Size of the page is 14.4MB.
- If I run the page with the modified css style, it loads couples sec faster
but it remains unresponsive after it loads. That corresponds to running my pull
request with “spark.ui.jsRenderingEnabled false”
- With the JSON solution, you have the page without the blocks table instantly
while it takes ~15 secs to load the blocks table. After that however the page
is totally responsive.
>From my limited tests I would say that it is a win using Javascript with JSON
>to render the page since the page remains responsive and usable after loading
>big tables.
> rdd in-memory storage UI becomes unresponsive when the number of RDD
> partitions is large
> ----------------------------------------------------------------------------------------
>
> Key: SPARK-2016
> URL: https://issues.apache.org/jira/browse/SPARK-2016
> Project: Spark
> Issue Type: Sub-task
> Reporter: Reynold Xin
> Labels: starter
>
> Try run
> {code}
> sc.parallelize(1 to 100, 1000000).cache().count()
> {code}
> And open the storage UI for this RDD. It takes forever to load the page.
> When the number of partitions is very large, I think there are a few
> alternatives:
> 0. Only show the top 1000.
> 1. Pagination
> 2. Instead of grouping by RDD blocks, group by executors
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]