[jira] [Commented] (SPARK-11700) Memory leak at SparkContext jobProgressListener stageIdToData map

Kostas papageorgopoulos (JIRA) Thu, 19 Nov 2015 00:45:36 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-11700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15013158#comment-15013158
 ]


Kostas papageorgopoulos commented on SPARK-11700:
-------------------------------------------------

Hi Shixiong Zhu. To my understanding the problem lies to the 
JobProgressListener.stageIdToData map.Even if i stop the JavaSparkContext and 
nullify it, a long time after that ,the heap space has the picture attached 
https://issues.apache.org/jira/secure/attachment/12772004/SparkMemoryAfterLotsOfConsecutiveRuns.png
 .

My application has a configured heap of 1gb. This heap is exceeded a lot jobs 
before 1000 consecutive Spark job runs. So i have to limit all the above 
configuration options to 5. According to the source code The {code} 
spark.ui.retainedStages {code} makes the difference for the stageIdToData map. 
However i would expect that when the JavaSparkContext is nullified all of the 
relevant objects of SparkContext to be garbage collected and free up my heap 
space. This does not happen so i have to keep the JavaSparkContext alway alive 
inside my program and have the above configurations to a small number

To my understanding What i would suggest is to have the following code in the 
JobProgressListener. When the application End event is fired (The sparkContext 
is stopped) all the relevant maps of the jobProgressListener to be cleared.

{code}

override def onApplicationEnd(applicationEnd: SparkListenerApplicationEnd) {
     activeJobs.clear()
     completedJobs.clear()
     failedJobs.clear()
     jobIdToData.clear()
     jobGroupToJobIds.clear()

    // Stages:
     pendingStages.clear()
     activeStages.clear()
     completedStages.clear()
     skippedStages.clear()
     failedStages.clear()
     stageIdToData.clear()
     stageIdToInfo.clear()
     stageIdToActiveJobIds.clear()
     poolToActiveStages.clear()
  }
{code}

> Memory leak at SparkContext jobProgressListener stageIdToData map
> -----------------------------------------------------------------
>
>                 Key: SPARK-11700
>                 URL: https://issues.apache.org/jira/browse/SPARK-11700
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 1.5.0, 1.5.1, 1.5.2
>         Environment: Ubuntu 14.04 LTS, Oracle JDK 1.8.51 Apache tomcat 
> 8.0.28. Spring 4
>            Reporter: Kostas papageorgopoulos
>            Assignee: Shixiong Zhu
>            Priority: Critical
>              Labels: leak, memory-leak
>         Attachments: AbstractSparkJobRunner.java, 
> SparkContextPossibleMemoryLeakIDEA_DEBUG.png, SparkHeapSpaceProgress.png, 
> SparkMemoryAfterLotsOfConsecutiveRuns.png, 
> SparkMemoryLeakAfterLotsOfRunsWithinTheSameContext.png
>
>
> it seems that there is  A SparkContext jobProgressListener memory leak.*. 
> Bellow i describe the  steps i do to reproduce that. 
> I have created a java webapp trying to abstractly Run some Spark Sql jobs 
> that read data from HDFS (join them) and Write them To ElasticSearch using ES 
> hadoop connector. After a Lot of consecutive runs  i noticed that my heap 
> space was full so i got an out of heap space error.
> At the attached file {code} AbstractSparkJobRunner {code} the {code}  public 
> final void run(T jobConfiguration, ExecutionLog executionLog) throws 
> Exception  {code} runs each time an Spark Sql Job is triggered.  So tried to 
> reuse the same SparkContext for a number of consecutive runs. If some rules 
> apply i try to clean up the SparkContext by first calling {code} 
> killSparkAndSqlContext {code}. This code eventually runs {code}  synchronized 
> (sparkContextThreadLock) {
>             if (javaSparkContext != null) {
>                 LOGGER.info("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! CLEARING SPARK 
> CONTEXT!!!!!!!!!!!!!!!!!!!!!!!!!!!");
>                 javaSparkContext.stop();
>                 javaSparkContext = null;
>                 sqlContext = null;
>                 System.gc();
>             }
>             numberOfRunningJobsForSparkContext.getAndSet(0);
>         }
> {code}.
> So at some point in time i suppose that if no other SparkSql job should run i 
> should kill the sparkContext  (The 
> AbstractSparkJobRunner.killSparkAndSqlContext  runs) and this should be 
> garbage collected from garbage collector. However this is not the case, Even 
> if in my debugger shows that my JavaSparkContext object is null see attached 
> picture {code} SparkContextPossibleMemoryLeakIDEA_DEBUG.png {code}.
> The jvisual vm shows an incremental heap space even when the garbage 
> collector is called. See attached picture {code} SparkHeapSpaceProgress.png 
> {code}.
> The memory analyser Tool shows that a big part of the retained heap to be 
> assigned to _jobProgressListener see attached picture {code} 
> SparkMemoryAfterLotsOfConsecutiveRuns.png {code}  and summary picture {code} 
> SparkMemoryLeakAfterLotsOfRunsWithinTheSameContext.png {code}. Although at 
> the same time in Singleton Service the JavaSparkContext is null.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-11700) Memory leak at SparkContext jobProgressListener stageIdToData map

Reply via email to