[
https://issues.apache.org/jira/browse/SPARK-19814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Simon King updated SPARK-19814:
-------------------------------
Description:
Spark History Server runs out of memory, gets into GC thrash and eventually
becomes unresponsive. This seems to happen more quickly with heavy use of the
REST API. We've seen this with several versions of Spark.
Running with the following settings (spark 2.1):
spark.history.fs.cleaner.enabled true
spark.history.fs.cleaner.interval 1d
spark.history.fs.cleaner.maxAge 7d
spark.history.retainedApplications 500
We will eventually get errors like:
17/02/25 05:02:19 WARN ServletHandler:·
javax.servlet.ServletException: scala.MatchError: java.lang.OutOfMemoryError:
GC overhead limit exceeded (of class java.lang.OutOfMemoryError)
at
org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:489)
at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
at
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
at
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
at
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
at
org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
at
org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
at
org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at
org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at
org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at
org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.spark_project.jetty.servlets.gzip.GzipHandler.handle(GzipHandler.java:529)
at
org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at
org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.spark_project.jetty.server.Server.handle(Server.java:499)
at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:311)
at
org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at
org.spark_project.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at
org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at
org.spark_project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)
Caused by: scala.MatchError: java.lang.OutOfMemoryError: GC overhead limit
exceeded (of class java.lang.OutOfMemoryError)
at
org.apache.spark.deploy.history.ApplicationCache.getSparkUI(ApplicationCache.scala:148)
at
org.apache.spark.deploy.history.HistoryServer.getSparkUI(HistoryServer.scala:110)
at
org.apache.spark.status.api.v1.UIRoot$class.withSparkUI(ApiRootResource.scala:244)
at
org.apache.spark.deploy.history.HistoryServer.withSparkUI(HistoryServer.scala:49)
at
org.apache.spark.status.api.v1.ApiRootResource.getJobs(ApiRootResource.scala:66)
at sun.reflect.GeneratedMethodAccessor102.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.glassfish.jersey.server.internal.routing.SubResourceLocatorRouter$1.run(SubResourceLocatorRouter.java:158)
at
org.glassfish.jersey.server.internal.routing.SubResourceLocatorRouter.getResource(SubResourceLocatorRouter.java:178)
at
org.glassfish.jersey.server.internal.routing.SubResourceLocatorRouter.apply(SubResourceLocatorRouter.java:109)
at
org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:109)
at
org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112)
at
org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112)
at
org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112)
at
org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112)
at
org.glassfish.jersey.server.internal.routing.RoutingStage.apply(RoutingStage.java:92)
at
org.glassfish.jersey.server.internal.routing.RoutingStage.apply(RoutingStage.java:61)
at org.glassfish.jersey.process.internal.Stages.process(Stages.java:197)
at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:318)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
at
org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305)
at
org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
at
org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473)
In our case we see memory usage gradually increase over perhaps 2 days, then
level off near max heap size (4G in our case), then often within 12-24 hours GC
activity will start to increase, and will result in more and more frequent
errors, as in the stack trace above.
was:
Spark History Server runs out of memory, gets into GC thrash and eventually
becomes unresponsive. This seems to happen more quickly with heavy use of the
REST API. We've seen this with several versions of Spark.
Running with the following settings (spark 2.1):
{{spark.history.fs.cleaner.enabled true
spark.history.fs.cleaner.interval 1d
spark.history.fs.cleaner.maxAge 7d
spark.history.retainedApplications 500}}
We will eventually get errors like:
{{17/02/25 05:02:19 WARN ServletHandler:·
javax.servlet.ServletException: scala.MatchError: java.lang.OutOfMemoryError:
GC overhead limit exceeded (of class java.lang.OutOfMemoryError)
at
org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:489)
at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
at
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
at
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
at
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
at
org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
at
org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
at
org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at
org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at
org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at
org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.spark_project.jetty.servlets.gzip.GzipHandler.handle(GzipHandler.java:529)
at
org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at
org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.spark_project.jetty.server.Server.handle(Server.java:499)
at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:311)
at
org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at
org.spark_project.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at
org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at
org.spark_project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)
Caused by: scala.MatchError: java.lang.OutOfMemoryError: GC overhead limit
exceeded (of class java.lang.OutOfMemoryError)
at
org.apache.spark.deploy.history.ApplicationCache.getSparkUI(ApplicationCache.scala:148)
at
org.apache.spark.deploy.history.HistoryServer.getSparkUI(HistoryServer.scala:110)
at
org.apache.spark.status.api.v1.UIRoot$class.withSparkUI(ApiRootResource.scala:244)
at
org.apache.spark.deploy.history.HistoryServer.withSparkUI(HistoryServer.scala:49)
at
org.apache.spark.status.api.v1.ApiRootResource.getJobs(ApiRootResource.scala:66)
at sun.reflect.GeneratedMethodAccessor102.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.glassfish.jersey.server.internal.routing.SubResourceLocatorRouter$1.run(SubResourceLocatorRouter.java:158)
at
org.glassfish.jersey.server.internal.routing.SubResourceLocatorRouter.getResource(SubResourceLocatorRouter.java:178)
at
org.glassfish.jersey.server.internal.routing.SubResourceLocatorRouter.apply(SubResourceLocatorRouter.java:109)
at
org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:109)
at
org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112)
at
org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112)
at
org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112)
at
org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112)
at
org.glassfish.jersey.server.internal.routing.RoutingStage.apply(RoutingStage.java:92)
at
org.glassfish.jersey.server.internal.routing.RoutingStage.apply(RoutingStage.java:61)
at org.glassfish.jersey.process.internal.Stages.process(Stages.java:197)
at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:318)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
at
org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305)
at
org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
at
org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473)}}
In our case we see memory usage gradually increase over perhaps 2 days, then
level off near max heap size (4G in our case), then often within 12-24 hours GC
activity will start to increase, and will result in more and more frequent
errors, as in the stack trace above.
> Spark History Server Out Of Memory / Extreme GC
> -----------------------------------------------
>
> Key: SPARK-19814
> URL: https://issues.apache.org/jira/browse/SPARK-19814
> Project: Spark
> Issue Type: Bug
> Components: Web UI
> Affects Versions: 1.6.1, 2.0.0, 2.1.0
> Environment: Spark History Server (we've run it on several different
> Hadoop distributions)
> Reporter: Simon King
>
> Spark History Server runs out of memory, gets into GC thrash and eventually
> becomes unresponsive. This seems to happen more quickly with heavy use of the
> REST API. We've seen this with several versions of Spark.
> Running with the following settings (spark 2.1):
> spark.history.fs.cleaner.enabled true
> spark.history.fs.cleaner.interval 1d
> spark.history.fs.cleaner.maxAge 7d
> spark.history.retainedApplications 500
> We will eventually get errors like:
> 17/02/25 05:02:19 WARN ServletHandler:·
> javax.servlet.ServletException: scala.MatchError: java.lang.OutOfMemoryError:
> GC overhead limit exceeded (of class java.lang.OutOfMemoryError)
> at
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:489)
> at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
> at
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
> at
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
> at
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
> at
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
> at
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
> at
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
> at
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
> at
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
> at
> org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.spark_project.jetty.servlets.gzip.GzipHandler.handle(GzipHandler.java:529)
> at
> org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
> at
> org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
> at org.spark_project.jetty.server.Server.handle(Server.java:499)
> at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:311)
> at
> org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
> at
> org.spark_project.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
> at
> org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
> at
> org.spark_project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: scala.MatchError: java.lang.OutOfMemoryError: GC overhead limit
> exceeded (of class java.lang.OutOfMemoryError)
> at
> org.apache.spark.deploy.history.ApplicationCache.getSparkUI(ApplicationCache.scala:148)
> at
> org.apache.spark.deploy.history.HistoryServer.getSparkUI(HistoryServer.scala:110)
> at
> org.apache.spark.status.api.v1.UIRoot$class.withSparkUI(ApiRootResource.scala:244)
> at
> org.apache.spark.deploy.history.HistoryServer.withSparkUI(HistoryServer.scala:49)
> at
> org.apache.spark.status.api.v1.ApiRootResource.getJobs(ApiRootResource.scala:66)
> at sun.reflect.GeneratedMethodAccessor102.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.glassfish.jersey.server.internal.routing.SubResourceLocatorRouter$1.run(SubResourceLocatorRouter.java:158)
> at
> org.glassfish.jersey.server.internal.routing.SubResourceLocatorRouter.getResource(SubResourceLocatorRouter.java:178)
> at
> org.glassfish.jersey.server.internal.routing.SubResourceLocatorRouter.apply(SubResourceLocatorRouter.java:109)
> at
> org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:109)
> at
> org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112)
> at
> org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112)
> at
> org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112)
> at
> org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112)
> at
> org.glassfish.jersey.server.internal.routing.RoutingStage.apply(RoutingStage.java:92)
> at
> org.glassfish.jersey.server.internal.routing.RoutingStage.apply(RoutingStage.java:61)
> at org.glassfish.jersey.process.internal.Stages.process(Stages.java:197)
> at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:318)
> at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
> at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
> at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
> at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
> at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
> at
> org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
> at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305)
> at
> org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
> at
> org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473)
> In our case we see memory usage gradually increase over perhaps 2 days, then
> level off near max heap size (4G in our case), then often within 12-24 hours
> GC activity will start to increase, and will result in more and more frequent
> errors, as in the stack trace above.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]