[ 
https://issues.apache.org/jira/browse/SPARK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080539#comment-14080539
 ] 

Carlos Fuertes commented on SPARK-2017:
---------------------------------------

Hi, I have implemented under https://github.com/apache/spark/pull/1682 the 
solution where you serve the data for the tables as JSON for tasks under 
'stages' and also 'storage' (this is issue SPARK-2016 which boils to same 
bottom problem). 

Main addition is exposing paths with the JSON data as:

/stages/stage/tasks/json/?id=nnn
/storage/json
/storage/rdd/workers/json?id=nnn
/storage/rdd/blocks/json?id=nnn

and using javascript to built the tables from an ajax request of those JSON. 

This solves partially the issue of responsiveness since the data is served 
asynchronously to the loading of the page. However since the driver is sending 
for every refresh all the data again, with very big number of tasks as they 
progress, that means that it starts taking longer and longer to send all the 
data. But at least the Summary table loads much faster with no need to wait for 
all the task table to complete.

A better solution would be to stream the data by chunks as they are ready or 
keep a cache of the previos results. I have not explored the latter yet but the 
above could be a start to build on it.


> web ui stage page becomes unresponsive when the number of tasks is large
> ------------------------------------------------------------------------
>
>                 Key: SPARK-2017
>                 URL: https://issues.apache.org/jira/browse/SPARK-2017
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Web UI
>            Reporter: Reynold Xin
>              Labels: starter
>
> {code}
> sc.parallelize(1 to 1000000, 1000000).count()
> {code}
> The above code creates one million tasks to be executed. The stage detail web 
> ui page takes forever to load (if it ever completes).
> There are again a few different alternatives:
> 0. Limit the number of tasks we show.
> 1. Pagination
> 2. By default only show the aggregate metrics and failed tasks, and hide the 
> successful ones.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to