[
https://issues.apache.org/jira/browse/SPARK-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14228677#comment-14228677
]
Josh Rosen commented on SPARK-4598:
-----------------------------------
I was able to reproduce this issue using the SparkPi example.
I captured a heap dump in YourKit and it looks like the raw, uncompressed HTML
of the Stage page is over 75 megabytes and the Scala XML tree corresponding to
the page is hundreds of megabytes (~200).
The actual HTML itself should be highly compressible, since it contains a lot
of redundancy. In the longer-term, we could also explore approaches that
perform more of the rendering / formatting in the browser using Javascript;
this would allow us to send the task table data as JSON or CSV, which would
contain much less redundancy; we could also avoid the overheads of the XML
library.
As as shorter-term hack, though, I wonder whether there's some trick to reduce
the overall memory usage of the intermediate scala.xml data structures, since
it seems odd that we end up materializing such a large object graph when it
seems like large portions of it could be lazily streamed. Maybe there's some
simple trick where sprinkling in a few {{.iterator}} calls would improve things.
> Paginate stage page to avoid OOM with > 100,000 tasks
> -----------------------------------------------------
>
> Key: SPARK-4598
> URL: https://issues.apache.org/jira/browse/SPARK-4598
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.2.0
> Reporter: meiyoula
> Priority: Critical
>
> In HistoryServer stage page, clicking the task href in Description, it occurs
> the GC error. The detail error message is:
> 2014-11-17 16:36:30,851 | WARN | [qtp1083955615-352] | Error for
> /history/application_1416206401491_0010/stages/stage/ |
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:590)
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> 2014-11-17 16:36:30,851 | WARN | [qtp1083955615-364] | handle failed |
> org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:697)
> java.lang.OutOfMemoryError: GC overhead limit exceeded
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]