[
https://issues.apache.org/jira/browse/SPARK-19814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Simon King closed SPARK-19814.
------------------------------
Resolution: Duplicate
Looks like it's wrong to characterize this as a bug -- couldn't identify an
actual memory leak. Seems more like we'll have to wait for the major overhaul
proposed by https://issues.apache.org/jira/browse/SPARK-18085
> Spark History Server Out Of Memory / Extreme GC
> -----------------------------------------------
>
> Key: SPARK-19814
> URL: https://issues.apache.org/jira/browse/SPARK-19814
> Project: Spark
> Issue Type: Bug
> Components: Web UI
> Affects Versions: 1.6.1, 2.0.0, 2.1.0
> Environment: Spark History Server (we've run it on several different
> Hadoop distributions)
> Reporter: Simon King
> Attachments: SparkHistoryCPUandRAM.png
>
>
> Spark History Server runs out of memory, gets into GC thrash and eventually
> becomes unresponsive. This seems to happen more quickly with heavy use of the
> REST API. We've seen this with several versions of Spark.
> Running with the following settings (spark 2.1):
> spark.history.fs.cleaner.enabled true
> spark.history.fs.cleaner.interval 1d
> spark.history.fs.cleaner.maxAge 7d
> spark.history.retainedApplications 500
> We will eventually get errors like:
> 17/02/25 05:02:19 WARN ServletHandler:ยท
> javax.servlet.ServletException: scala.MatchError: java.lang.OutOfMemoryError:
> GC overhead limit exceeded (of class java.lang.OutOfMemoryError)
> at
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:489)
> at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
> at
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
> at
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
> at
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
> at
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
> at
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
> at
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
> at
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
> at
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
> at
> org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.spark_project.jetty.servlets.gzip.GzipHandler.handle(GzipHandler.java:529)
> at
> org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
> at
> org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
> at org.spark_project.jetty.server.Server.handle(Server.java:499)
> at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:311)
> at
> org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
> at
> org.spark_project.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
> at
> org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
> at
> org.spark_project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: scala.MatchError: java.lang.OutOfMemoryError: GC overhead limit
> exceeded (of class java.lang.OutOfMemoryError)
> at
> org.apache.spark.deploy.history.ApplicationCache.getSparkUI(ApplicationCache.scala:148)
> at
> org.apache.spark.deploy.history.HistoryServer.getSparkUI(HistoryServer.scala:110)
> at
> org.apache.spark.status.api.v1.UIRoot$class.withSparkUI(ApiRootResource.scala:244)
> at
> org.apache.spark.deploy.history.HistoryServer.withSparkUI(HistoryServer.scala:49)
> at
> org.apache.spark.status.api.v1.ApiRootResource.getJobs(ApiRootResource.scala:66)
> at sun.reflect.GeneratedMethodAccessor102.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.glassfish.jersey.server.internal.routing.SubResourceLocatorRouter$1.run(SubResourceLocatorRouter.java:158)
> at
> org.glassfish.jersey.server.internal.routing.SubResourceLocatorRouter.getResource(SubResourceLocatorRouter.java:178)
> at
> org.glassfish.jersey.server.internal.routing.SubResourceLocatorRouter.apply(SubResourceLocatorRouter.java:109)
> at
> org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:109)
> at
> org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112)
> at
> org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112)
> at
> org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112)
> at
> org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112)
> at
> org.glassfish.jersey.server.internal.routing.RoutingStage.apply(RoutingStage.java:92)
> at
> org.glassfish.jersey.server.internal.routing.RoutingStage.apply(RoutingStage.java:61)
> at org.glassfish.jersey.process.internal.Stages.process(Stages.java:197)
> at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:318)
> at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
> at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
> at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
> at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
> at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
> at
> org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
> at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305)
> at
> org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
> at
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473)
> In our case we see memory usage gradually increase over perhaps 2 days, then
> level off near max heap size (4G in our case), then often within 12-24 hours
> GC activity will start to increase, and will result in more and more frequent
> errors, as in the stack trace above.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]