[
https://issues.apache.org/jira/browse/SPARK-11700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kostas papageorgopoulos updated SPARK-11700:
--------------------------------------------
Description:
I have created a java webapp trying to abstractly Run some Spark Sql jobs that
read data from HDFS (join them) and Write them To ElasticSearch using ES hadoop
connector. After a Lot of consecutive runs i noticed that my heap space was
full so i got an out of heap space error.
At the attached file {code} AbstractSparkJobRunner {code} the {code} public
final void run(T jobConfiguration, ExecutionLog executionLog) throws Exception
{code} runs each time an Spark Sql Job is triggered. So tried to reuse the
same SparkContext for a number of consecutive runs. If some rules apply i try
to clean up the SparkContext by first calling {code} killSparkAndSqlContext
{code}. This code eventually runs {code} synchronized (sparkContextThreadLock)
{
if (javaSparkContext != null) {
LOGGER.info("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! CLEARING SPARK
CONTEXT!!!!!!!!!!!!!!!!!!!!!!!!!!!");
javaSparkContext.stop();
javaSparkContext = null;
sqlContext = null;
System.gc();
}
numberOfRunningJobsForSparkContext.getAndSet(0);
}
{code}.
So at some point in time i suppose that if no other SparkSql job should run i
should kill the sparkContext and this should be garbage collected from garbage
collector. However this is not the case, Even if in my debugger shows that my
JavaSparkContext object is null see attached picture {code}
SparkContextPossibleMemoryLeakIDEA_DEBUG.png {code}
was:
I have created a java webapp trying to abstractly Run some Spark Sql jobs that
read data from HDFS (join them) and Write them To ElasticSearch using ES hadoop
connector. After a Lot of consecutive runs i noticed that my heap space was
full so i got an out of heap space error.
At the attached file {code} AbstractSparkJobRunner {code} the {code} public
final void run(T jobConfiguration, ExecutionLog executionLog) throws Exception
{code} runs each time an Spark Sql Job is triggered. So tried to reuse the
same SparkContext for a number of consecutive runs. If some rules apply i try
to clean up the SparkContext by first calling {code} killSparkAndSqlContext
{code}. This code eventually runs {code} synchronized (sparkContextThreadLock)
{
if (javaSparkContext != null) {
LOGGER.info("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! CLEARING SPARK
CONTEXT!!!!!!!!!!!!!!!!!!!!!!!!!!!");
javaSparkContext.stop();
javaSparkContext = null;
sqlContext = null;
System.gc();
}
numberOfRunningJobsForSparkContext.getAndSet(0);
}
{code}.
So at some point in time i suppose that if no other SparkSql job should run i
should kill the sparkContext and this should be garbage collected from garbage
collector. However this is not the case, Even if in my debugger see attached
picture {code} SparkContextPossibleMemoryLeakIDEA_DEBUG.png {code}
> Possible memory leak at SparkContext jobProgressListener stageIdToData map
> --------------------------------------------------------------------------
>
> Key: SPARK-11700
> URL: https://issues.apache.org/jira/browse/SPARK-11700
> Project: Spark
> Issue Type: Question
> Components: Spark Core, SQL
> Affects Versions: 1.5.0, 1.5.1, 1.5.2
> Environment: Ubuntu 14.04 LTS, Oracle JDK 1.8.51 Apache tomcat
> 8.0.28. Spring 4
> Reporter: Kostas papageorgopoulos
> Priority: Minor
> Labels: leak, memory-leak
>
> I have created a java webapp trying to abstractly Run some Spark Sql jobs
> that read data from HDFS (join them) and Write them To ElasticSearch using ES
> hadoop connector. After a Lot of consecutive runs i noticed that my heap
> space was full so i got an out of heap space error.
> At the attached file {code} AbstractSparkJobRunner {code} the {code} public
> final void run(T jobConfiguration, ExecutionLog executionLog) throws
> Exception {code} runs each time an Spark Sql Job is triggered. So tried to
> reuse the same SparkContext for a number of consecutive runs. If some rules
> apply i try to clean up the SparkContext by first calling {code}
> killSparkAndSqlContext {code}. This code eventually runs {code} synchronized
> (sparkContextThreadLock) {
> if (javaSparkContext != null) {
> LOGGER.info("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! CLEARING SPARK
> CONTEXT!!!!!!!!!!!!!!!!!!!!!!!!!!!");
> javaSparkContext.stop();
> javaSparkContext = null;
> sqlContext = null;
> System.gc();
> }
> numberOfRunningJobsForSparkContext.getAndSet(0);
> }
> {code}.
> So at some point in time i suppose that if no other SparkSql job should run i
> should kill the sparkContext and this should be garbage collected from
> garbage collector. However this is not the case, Even if in my debugger shows
> that my JavaSparkContext object is null see attached picture {code}
> SparkContextPossibleMemoryLeakIDEA_DEBUG.png {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]