[ 
https://issues.apache.org/jira/browse/SPARK-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14228677#comment-14228677
 ] 

Josh Rosen commented on SPARK-4598:
-----------------------------------

I was able to reproduce this issue using the SparkPi example.

I captured a heap dump in YourKit and it looks like the raw, uncompressed HTML 
of the Stage page is over 75 megabytes and the Scala XML tree corresponding to 
the page is hundreds of megabytes (~200).

The actual HTML itself should be highly compressible, since it contains a lot 
of redundancy.  In the longer-term, we could also explore approaches that 
perform more of the rendering / formatting in the browser using Javascript; 
this would allow us to send the task table data as JSON or CSV, which would 
contain much less redundancy; we could also avoid the overheads of the XML 
library.

As as shorter-term hack, though, I wonder whether there's some trick to reduce 
the overall memory usage of the intermediate scala.xml data structures, since 
it seems odd that we end up materializing such a large object graph when it 
seems like large portions of it could be lazily streamed.  Maybe there's some 
simple trick where sprinkling in a few {{.iterator}} calls would improve things.

> Paginate stage page to avoid OOM with > 100,000 tasks
> -----------------------------------------------------
>
>                 Key: SPARK-4598
>                 URL: https://issues.apache.org/jira/browse/SPARK-4598
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.2.0
>            Reporter: meiyoula
>            Priority: Critical
>
> In HistoryServer stage page, clicking the task href in Description, it occurs 
> the GC error. The detail error message is:
> 2014-11-17 16:36:30,851 | WARN  | [qtp1083955615-352] | Error for 
> /history/application_1416206401491_0010/stages/stage/ | 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:590)
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> 2014-11-17 16:36:30,851 | WARN  | [qtp1083955615-364] | handle failed | 
> org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:697)
> java.lang.OutOfMemoryError: GC overhead limit exceeded



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to