Kostas papageorgopoulos created SPARK-11700:
-----------------------------------------------
Summary: Possible memory leak at SparkContext jobProgressListener
stageIdToData map
Key: SPARK-11700
URL: https://issues.apache.org/jira/browse/SPARK-11700
Project: Spark
Issue Type: Question
Components: Spark Core, SQL
Affects Versions: 1.5.1, 1.5.0, 1.5.2
Environment: Ubuntu 14.04 LTS, Oracle JDK 1.8.51 Apache tomcat 8.0.28.
Spring 4
Reporter: Kostas papageorgopoulos
Priority: Minor
I have created a java webapp trying to abstractly Run some Spark Sql jobs that
read data from HDFS (join them) and Write them To ElasticSearch using ES hadoop
connector. After a Lot of consecutive runs i noticed that my heap space was
full so i got an out of heap space error.
At the attached file {code} AbstractSparkJobRunner {code} the {code} public
final void run(T jobConfiguration, ExecutionLog executionLog) throws Exception
{code} runs each time an Spark Sql Job is triggered. So tried to reuse the
same SparkContext for a number of consecutive runs. If some rules apply i try
to clean up the SparkContext by first calling {code} killSparkAndSqlContext
{code}. This code eventually runs {code} synchronized (sparkContextThreadLock)
{
if (javaSparkContext != null) {
LOGGER.info("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! CLEARING SPARK
CONTEXT!!!!!!!!!!!!!!!!!!!!!!!!!!!");
javaSparkContext.stop();
javaSparkContext = null;
sqlContext = null;
System.gc();
}
numberOfRunningJobsForSparkContext.getAndSet(0);
}
{code}.
So at some point in time i suppose that if no other SparkSql job should run i
should kill the sparkContext and this should be garbage collected from garbage
collector. However this is not the case, Even if in my debugger see attached
picture {code} SparkContextPossibleMemoryLeakIDEA_DEBUG.png {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]