[jira] [Commented] (SPARK-34689) Spark Thrift Server: Memory leak for SparkSession objects

Dimitris Batis (Jira) Thu, 11 Mar 2021 06:52:04 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-34689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17299603#comment-17299603
 ]


Dimitris Batis commented on SPARK-34689:
----------------------------------------

After further examination, this seems to be related to SPARK-34087 . Based on 
the pull requests on that ticket, I added     
ctx.sparkSession.listenerManager.clearListenerBus() to 
SparkSQLSessionManager#closeSession() as in the attached "git diff" file, and 
it seems that, in local tests, SparkSession objects are released properly. I am 
not sure if this is a complete solution or if there are any side-effects.

> Spark Thrift Server: Memory leak for SparkSession objects
> ---------------------------------------------------------
>
>                 Key: SPARK-34689
>                 URL: https://issues.apache.org/jira/browse/SPARK-34689
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 3.0.1, 3.1.1
>            Reporter: Dimitris Batis
>            Priority: Major
>         Attachments: heap_sparksession.png, 
> heapdump_local_attempt_250_closed_connections.png, test_patch.diff
>
>
> When running the Spark Thrift Server (3.0.1, standalone cluster), we have 
> noticed that each new JDBC connection creates a new SparkSession object. This 
> object (and anything being referenced by it), however, remains in memory 
> indefinitely even though the JDBC connection is closed, and full GCs do not 
> remove it. After about 18 hours of heavy use, we get more than 46.000 such 
> objects (heap_sparksession.png).
> In a small local installation test, I replicated the behavior by simply 
> opening a JDBC connection, executing SHOW SCHEMAS and closing the connection 
> (heapdump_local_attempt.png). For each connection, a new SparkSession object 
> is created and never removed. I have noticed the same behavior in Spark 3.1.1 
> as well.
> Our settings are as follows. Please note that this was occuring even before 
> we added the ExplicitGCInvokesConcurrent option (i.e. it happened even when a 
> full GC was performed every 20 minutes). 
> spark-defaults.conf:
> {code}
> spark.master                    spark://...:7077,...:7077
> spark.master.rest.enabled       true
> spark.eventLog.enabled          false
> spark.eventLog.dir              file:///...
> spark.driver.cores             1
> spark.driver.maxResultSize     4g
> spark.driver.memory            5g
> spark.executor.memory          1g
> spark.executor.logs.rolling.maxRetainedFiles   2
> spark.executor.logs.rolling.strategy           size
> spark.executor.logs.rolling.maxSize            1G
> spark.local.dir ...
> spark.sql.ui.retainedExecutions=10
> spark.ui.retainedDeadExecutors=10
> spark.worker.ui.retainedExecutors=10
> spark.worker.ui.retainedDrivers=10
> spark.ui.retainedJobs=30
> spark.ui.retainedStages=100
> spark.ui.retainedTasks=500
> spark.appStateStore.asyncTracking.enable=false
> spark.sql.shuffle.partitions=200
> spark.default.parallelism=200
> spark.task.reaper.enabled=true
> spark.task.reaper.threadDump=false
> spark.memory.offHeap.enabled=true
> spark.memory.offHeap.size=4g
> {code}
> spark-env.sh:
> {code}
> HADOOP_CONF_DIR="/.../hadoop/etc/hadoop"
> SPARK_WORKER_CORES=28
> SPARK_WORKER_MEMORY=54g
> SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true 
> -Dspark.worker.cleanup.appDataTtl=172800 -XX:+UseG1GC 
> -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=40 "
> SPARK_DAEMON_JAVA_OPTS="-Dlog4j.configuration=file:///.../log4j.properties 
> -Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.dir="..." 
> -Dspark.deploy.zookeeper.url=...:2181,...:2181,...:2181 -XX:+UseG1GC 
> -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=40"
> {code}
> start-thriftserver.sh:
> {code}
> export SPARK_DAEMON_MEMORY=16g
> exec "${SPARK_HOME}"/sbin/spark-daemon.sh submit $CLASS 1 \
>   --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
>   --conf "spark.ui.retainedJobs=30" \
>   --conf "spark.ui.retainedStages=100" \
>   --conf "spark.ui.retainedTasks=500" \
>   --conf "spark.sql.ui.retainedExecutions=10" \
>   --conf "spark.appStateStore.asyncTracking.enable=false" \
>   --conf "spark.cleaner.periodicGC.interval=20min" \
>   --conf "spark.sql.autoBroadcastJoinThreshold=-1" \
>   --conf "spark.executor.extraJavaOptions=-verbose:gc -XX:+PrintGCDetails 
> -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseG1GC 
> -XX:MaxGCPauseMillis=200" \
>   --conf "spark.driver.extraJavaOptions=-verbose:gc -XX:+PrintGCDetails 
> -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps 
> -Xloggc:/.../thrift_driver_gc.log -XX:+UseGCLogFileRotation 
> -XX:NumberOfGCLogFiles=7 -XX:GCLogFileSize=35M -XX:+UseG1GC 
> -XX:MaxGCPauseMillis=200 -Dcom.sun.management.jmxremote 
> -Dcom.sun.management.jmxremote.authenticate=false 
> -Dcom.sun.management.jmxremote.ssl=false 
> -Dcom.sun.management.jmxremote.port=11990 -XX:+ExplicitGCInvokesConcurrent" \
>   --conf "spark.metrics.namespace=..." --name "..." --packages 
> io.delta:delta-core_2.12:0.7.0 --hiveconf spark.ui.port=4038 --hiveconf 
> spark.cores.max=22 --hiveconf spark.executor.cores=3 --hiveconf 
> spark.executor.memory=6144M --hiveconf spark.scheduler.mode=FAIR --hiveconf 
> spark.scheduler.allocation.file=.../conf/thrift-scheduler.xml \
>   --conf spark.sql.thriftServer.incrementalCollect=true "$@"
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-34689) Spark Thrift Server: Memory leak for SparkSession objects

Reply via email to