I have no other advice for this. Does the situation improve after parameter configuration?
Jason Jun <jaes...@gmail.com> 于2022年8月5日周五 06:55写道: > Hi Qian, > > Thanks for your feedback. We're using spark ver 3.1.2, these are set : > > spark.ui.retainedJobs 10 > spark.ui.retainedStages 10 > spark.ui.retainedTasks 100 > > I'll set this, spark.ui.dagGraph.retainedRootRDDs, as well. > > Any other advice for this? > > Thanks > Jason > > On Wed, 3 Aug 2022 at 15:56, Qian Sun <qian.sun2...@gmail.com> wrote: > >> Hi Jason >> LiveUI initializes ElementTrackingStore with InMemoryStore, so it has OOM >> risk. >> >> /** >> * Create an in-memory store for a live application. >> */ >> def createLiveStore( >> conf: SparkConf, >> appStatusSource: Option[AppStatusSource] = None): AppStatusStore = { >> val store = new ElementTrackingStore(new InMemoryStore(), conf) >> val listener = new AppStatusListener(store, conf, true, appStatusSource) >> new AppStatusStore(store, listener = Some(listener)) >> } >> >> In addition to the parameters you mentioned, you can try to reduce the >> following parameters: >> * spark.ui.retainedTasks >> * spark.ui.dagGraph.retainedRootRDDs >> >> If you have more information about this situation, it would be good. >> >> Best >> Qian >> >> >> 2022年8月3日 上午11:04,Jason Jun <jaes...@gmail.com> 写道: >> >> He there, >> >> We have spark driver running 24x7, and we are continiously getting OOM in >> spark driver every 10 days. >> I found org.apache.spark.status.ElementTrackingStore keep 85% of >> heap usage after analyzing heap dump like this image: >> <image.png> >> >> i found these parameter would be the root cause in jira ticket, >> https://issues.apache.org/jira/browse/SPARK-26395 >> >> - spark.ui.retainedDeadExecutors >> - spark.ui.retainedJobs >> - spark.ui.retainedStages >> >> >> But it didn't work. OOM is delayed from 1 week to 10 days with these >> changes. >> >> It would be really appreciated if anyone can give me any solutions. >> >> Thanks >> Jason >> >> . >> >> >> -- Best! Qian SUN