[jira] [Comment Edited] (SPARK-10529) When creating multiple HiveContext objects in one jvm, jdbc connections to metastore cann't be released and it may cause PermGen OutOfMemoryError.

ZhengYaofeng (JIRA) Thu, 10 Sep 2015 03:15:00 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-10529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738541#comment-14738541
 ]


ZhengYaofeng edited comment on SPARK-10529 at 9/10/15 10:13 AM:
----------------------------------------------------------------

I find that IsolatedClientLoader class contains an attribute called 
'classLoader', created as a URLClassLoader object. So, if you create a new 
IsolatedClientLoader object, you will get a new URLClassLoader object. As you 
know, different URLClassLoader objects will not find each other's loaded 
classes, so they will load new and the same amount of classes, including 
ClientWrapper, ReflectionMagic and so on. Maybe you've got something. Yeah, in 
my test program, I create multiple HiveContext objects, it means multiple 
IsolatedClientLoader objects and multiple URLClassLoader objects are created. 
Maybe IsolatedClientLoader objects and URLClassLoader objects can be released, 
however classes loaded by different URLClassLoader objects can't be released. 

I made a patch to resolve this problem. I just put a ThreadLocal variable to 
store the URLClassLoader object. So, every time you create a new HiveContext 
object, it will use the same classLoader and it can find classes already loaded 
by itself. You won't worry about the jdbc connections being released or loading 
classes repeatedly.


was (Author: gavingavinno1):
I find that IsolatedClientLoader class contains an attribute called 
'classLoader', created as a URLClassLoader object. So, if you create a new 
IsolatedClientLoader object, you will get a new URLClassLoader object. As you 
know, different URLClassLoader objects will not find each other's loaded 
classes, so they will load new and the same amount of classes, including 
ClientWrapper, ReflectionMagic and so on. Maybe you've got something. Yeah, in 
my test program, I create multiple HiveContext objects, it means multiple 
IsolatedClientLoader objects and multiple URLClassLoader objects are created. 
Maybe IsolatedClientLoader objects and URLClassLoader objects can be released, 
however classes loaded by different URLClassLoader objects can't be released. 

I made a patch to resolve this problem. I just put a ThreadLocal variable to 
store the URLClassLoader object. So, every time you create a HiveContext 
object, it will use the same classLoader and it can find classes already loaded 
by itself. You won't worry about the jdbc connections being released or loading 
classes repeatedly.

> When creating multiple HiveContext objects in one jvm, jdbc connections to 
> metastore cann't be released and it may cause PermGen OutOfMemoryError.
> --------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-10529
>                 URL: https://issues.apache.org/jira/browse/SPARK-10529
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.4.1
>            Reporter: ZhengYaofeng
>         Attachments: IsolatedClientLoader.scala
>
>
> Test code as follows:
> object SqlTest {
>   def main(args: Array[String]) {
>     def createSc = {
>       val sparkConf = new SparkConf().setAppName(s"SqlTest")
>         .setMaster("spark://zdh221:7077")
>         .set("spark.executor.memory", "4g")
>         .set("spark.executor.cores", "2")
>         .set("spark.cores.max", "6")
>       new SparkContext(sparkConf)
>     }
>     for (index <- 1 to 200) {
>       println(s"============Current Index:${index}=============")
>       val hc = new HiveContext(createSc)
>       hc.sql("show databases").collect().foreach(println)
>       hc.sparkContext.stop()
>       Thread.sleep(3000)
>     }
>     Thread.sleep(1000000)
>   }
> }     
> Testing on spark 1.4.1 with run cmd bellow.
>       export 
> CLASSPATH="$CLASSPATH:/home/hadoop/spark/conf:/home/hadoop/spark/lib/*:/home/hadoop/zyf/lib/*"
>       java -Xmx8096m -Xms1024m -XX:MaxPermSize=1024m -cp $CLASSPATH SqlTest
> Files list:
>       
> /home/hadoop/spark/conf:core-site.xml;hdfs-site.xml;hive-site.xml;slaves;spark-defaults.conf;spark-env.sh
>       
> /home/hadoop/zyf/lib:hadoop-lzo-0.4.20.jar;mysql-connector-java-5.1.28-bin.jar;sqltest-1.0-SNAPSHOT.jar
>       
> MySQL is used as the metastore. You can obviously see that jdbc connections 
> to MySQL grow constantly through command 'show status like 
> 'Threads_connected';' when my test app is running. Even if you invoke 
> 'Hive.closeCurrent()', it cann't release current jdbc connections. Besides I 
> can not find another possible way. If you take spark 1.3.1 to test, jdbc 
> connections won't grow.
> Meanwhile, it ends with 'java.lang.OutOfMemoryError: PermGen space' when 
> cycling 45 times, which means 45 HiveContext objects are created. It's 
> interesting that if you set MaxPermSize to '2048m', it can cycle 93 times, if 
> you set MaxPermSize to '3072m', it can cycle 141 times. So,it indicates that 
> each time creating one HiveContext object, it loads the same amount of new 
> classes and they won't be released.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-10529) When creating multiple HiveContext objects in one jvm, jdbc connections to metastore cann't be released and it may cause PermGen OutOfMemoryError.

Reply via email to