[
https://issues.apache.org/jira/browse/SPARK-34689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17299603#comment-17299603
]
Dimitris Batis commented on SPARK-34689:
----------------------------------------
After further examination, this seems to be related to SPARK-34087 . Based on
the pull requests on that ticket, I added
ctx.sparkSession.listenerManager.clearListenerBus() to
SparkSQLSessionManager#closeSession() as in the attached "git diff" file, and
it seems that, in local tests, SparkSession objects are released properly. I am
not sure if this is a complete solution or if there are any side-effects.
> Spark Thrift Server: Memory leak for SparkSession objects
> ---------------------------------------------------------
>
> Key: SPARK-34689
> URL: https://issues.apache.org/jira/browse/SPARK-34689
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, SQL
> Affects Versions: 3.0.1, 3.1.1
> Reporter: Dimitris Batis
> Priority: Major
> Attachments: heap_sparksession.png,
> heapdump_local_attempt_250_closed_connections.png, test_patch.diff
>
>
> When running the Spark Thrift Server (3.0.1, standalone cluster), we have
> noticed that each new JDBC connection creates a new SparkSession object. This
> object (and anything being referenced by it), however, remains in memory
> indefinitely even though the JDBC connection is closed, and full GCs do not
> remove it. After about 18 hours of heavy use, we get more than 46.000 such
> objects (heap_sparksession.png).
> In a small local installation test, I replicated the behavior by simply
> opening a JDBC connection, executing SHOW SCHEMAS and closing the connection
> (heapdump_local_attempt.png). For each connection, a new SparkSession object
> is created and never removed. I have noticed the same behavior in Spark 3.1.1
> as well.
> Our settings are as follows. Please note that this was occuring even before
> we added the ExplicitGCInvokesConcurrent option (i.e. it happened even when a
> full GC was performed every 20 minutes).
> spark-defaults.conf:
> {code}
> spark.master spark://...:7077,...:7077
> spark.master.rest.enabled true
> spark.eventLog.enabled false
> spark.eventLog.dir file:///...
> spark.driver.cores 1
> spark.driver.maxResultSize 4g
> spark.driver.memory 5g
> spark.executor.memory 1g
> spark.executor.logs.rolling.maxRetainedFiles 2
> spark.executor.logs.rolling.strategy size
> spark.executor.logs.rolling.maxSize 1G
> spark.local.dir ...
> spark.sql.ui.retainedExecutions=10
> spark.ui.retainedDeadExecutors=10
> spark.worker.ui.retainedExecutors=10
> spark.worker.ui.retainedDrivers=10
> spark.ui.retainedJobs=30
> spark.ui.retainedStages=100
> spark.ui.retainedTasks=500
> spark.appStateStore.asyncTracking.enable=false
> spark.sql.shuffle.partitions=200
> spark.default.parallelism=200
> spark.task.reaper.enabled=true
> spark.task.reaper.threadDump=false
> spark.memory.offHeap.enabled=true
> spark.memory.offHeap.size=4g
> {code}
> spark-env.sh:
> {code}
> HADOOP_CONF_DIR="/.../hadoop/etc/hadoop"
> SPARK_WORKER_CORES=28
> SPARK_WORKER_MEMORY=54g
> SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true
> -Dspark.worker.cleanup.appDataTtl=172800 -XX:+UseG1GC
> -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=40 "
> SPARK_DAEMON_JAVA_OPTS="-Dlog4j.configuration=file:///.../log4j.properties
> -Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.dir="..."
> -Dspark.deploy.zookeeper.url=...:2181,...:2181,...:2181 -XX:+UseG1GC
> -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=40"
> {code}
> start-thriftserver.sh:
> {code}
> export SPARK_DAEMON_MEMORY=16g
> exec "${SPARK_HOME}"/sbin/spark-daemon.sh submit $CLASS 1 \
> --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
> --conf "spark.ui.retainedJobs=30" \
> --conf "spark.ui.retainedStages=100" \
> --conf "spark.ui.retainedTasks=500" \
> --conf "spark.sql.ui.retainedExecutions=10" \
> --conf "spark.appStateStore.asyncTracking.enable=false" \
> --conf "spark.cleaner.periodicGC.interval=20min" \
> --conf "spark.sql.autoBroadcastJoinThreshold=-1" \
> --conf "spark.executor.extraJavaOptions=-verbose:gc -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseG1GC
> -XX:MaxGCPauseMillis=200" \
> --conf "spark.driver.extraJavaOptions=-verbose:gc -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
> -Xloggc:/.../thrift_driver_gc.log -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=7 -XX:GCLogFileSize=35M -XX:+UseG1GC
> -XX:MaxGCPauseMillis=200 -Dcom.sun.management.jmxremote
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.port=11990 -XX:+ExplicitGCInvokesConcurrent" \
> --conf "spark.metrics.namespace=..." --name "..." --packages
> io.delta:delta-core_2.12:0.7.0 --hiveconf spark.ui.port=4038 --hiveconf
> spark.cores.max=22 --hiveconf spark.executor.cores=3 --hiveconf
> spark.executor.memory=6144M --hiveconf spark.scheduler.mode=FAIR --hiveconf
> spark.scheduler.allocation.file=.../conf/thrift-scheduler.xml \
> --conf spark.sql.thriftServer.incrementalCollect=true "$@"
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]