[jira] [Commented] (SPARK-29055) Memory leak in Spark

George Papa (Jira) Tue, 01 Oct 2019 03:57:46 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-29055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941736#comment-16941736
 ]


George Papa commented on SPARK-29055:
-------------------------------------

 
[~kabhwan]
First of all thank for your time about this issue!!
As I described in the initial description, after a long time run, I start 
having spill to disk. So, maybe the bug in the UI memory usage report which is 
look to be corrected with ([GitHub Pull Request 
#25973|https://github.com/apache/spark/pull/25973]), there is another hidden 
bug.
 
I run the code in the snippet (I test it without any sleeping time, in order to 
see the results faster) and I have recorded the JVM memory usage for 
approximately 1 hour between Spark 2.4.4 and your branch with your patch.
Spark JVM memory with Spark 2.4.4:
||Time||RES||SHR||MEM%||
|1min|{color:#de350b}1349{color}|32724|1.5|
|3min|{color:#de350b}1936{color}|32724|2.2|
|5min|{color:#de350b}2506{color}|32724|2.6|
|7min|{color:#de350b}2564{color}|32724|2.7|
|9min|{color:#de350b}2584{color}|32724|2.7|
|11min|{color:#de350b}2585{color}|32724|2.7|
|13min|{color:#de350b}2592{color}|32724|2.7|
|15min|{color:#de350b}2591{color}|32724|2.7|
|17min|{color:#de350b}2591{color}|32724|2.7|
|30min|{color:#de350b}2600{color}|32724|2.7|
|1h|{color:#de350b}2618{color}|32724|2.7|

 

Spark JVM memory with Spark patch([GitHub Pull Request 
#25973|https://github.com/apache/spark/pull/25973])

 
||Time||RES||SHR||MEM%||
|1min|{color:#de350b}1134{color}|25380|1.4|
|3min|{color:#de350b}1520{color}|25380|1.6|
|5min|{color:#de350b}1570{color}|25380|1.6|
|7min|{color:#de350b}1598{color}|25380|1.7|
|9min|{color:#de350b}1613{color}|25380|1.7|
|11min|{color:#de350b}1616{color}|25380|1.7|
|15min|{color:#de350b}1620{color}|25380|1.7|
|17min|{color:#de350b}1625{color}|25380|1.7|
|30min|{color:#de350b}1629{color}|25380|1.7|
|1h|{color:#de350b}1660{color}|25380|1.7|

 

As you can see the RES memory is slightly increasing in both cases overtime. 
Also, when I tested with a real streaming application in a testing env after 
hours, the persisted dataframes overflows the memory and spill to disk.

* You can easily reproduce the above behavior, by running the snippet code (I 
prefer to run without any sleeping delay) and track the JVM memory with top or 
htop command.

 

 

 

> Memory leak in Spark
> --------------------
>
>                 Key: SPARK-29055
>                 URL: https://issues.apache.org/jira/browse/SPARK-29055
>             Project: Spark
>          Issue Type: Bug
>          Components: Block Manager, Spark Core
>    Affects Versions: 2.3.3
>            Reporter: George Papa
>            Priority: Major
>         Attachments: test_csvs.zip
>
>
> I used Spark 2.1.1 and I upgraded into new versions. After Spark version 
> 2.3.3,  I observed from Spark UI that the driver memory is{color:#ff0000} 
> increasing continuously.{color}
> In more detail, the driver memory and executors memory have the same used 
> memory storage and after each iteration the storage memory is increasing. You 
> can reproduce this behavior by running the following snippet code. The 
> following example, is very simple, without any dataframe persistence, but the 
> memory consumption is not stable as it was in former Spark versions 
> (Specifically until Spark 2.3.2).
> Also, I tested with Spark streaming and structured streaming API and I had 
> the same behavior. I tested with an existing application which reads from 
> Kafka source and do some aggregations, persist dataframes and then unpersist 
> them. The persist and unpersist it works correct, I see the dataframes in the 
> storage tab in Spark UI and after the unpersist, all dataframe have removed. 
> But, after the unpersist the executors memory is not zero, BUT has the same 
> value with the driver memory. This behavior also affects the application 
> performance because the memory of the executors is increasing as the driver 
> increasing and after a while the persisted dataframes are not fit in the 
> executors memory and  I have spill to disk.
> Another error which I had after a long running, was 
> {color:#ff0000}java.lang.OutOfMemoryError: GC overhead limit exceeded, but I 
> don't know if its relevant with the above behavior or not.{color}
>  
> *HOW TO REPRODUCE THIS BEHAVIOR:*
> Create a very simple application(streaming count_file.py) in order to 
> reproduce this behavior. This application reads CSV files from a directory, 
> count the rows and then remove the processed files.
> {code:java}
> import time
> import os
> from pyspark.sql import SparkSession
> from pyspark.sql import functions as F
> from pyspark.sql import types as T
> target_dir = "..."
> spark=SparkSession.builder.appName("DataframeCount").getOrCreate()
> while True:
>     for f in os.listdir(target_dir):
>         df = spark.read.load(target_dir + f, format="csv")
>         print("Number of records: {0}".format(df.count()))
>         time.sleep(15){code}
> Submit code:
> {code:java}
> spark-submit 
> --master spark://xxx.xxx.xx.xxx
> --deploy-mode client
> --executor-memory 4g
> --executor-cores 3
> streaming count_file.py
> {code}
>  
> *TESTED CASES WITH THE SAME BEHAVIOUR:*
>  * I tested with default settings (spark-defaults.conf)
>  * Add spark.cleaner.periodicGC.interval 1min (or less)
>  * {{Turn spark.cleaner.referenceTracking.blocking}}=false
>  * Run the application in cluster mode
>  * Increase/decrease the resources of the executors and driver
>  * I tested with extraJavaOptions in driver and executor -XX:+UseG1GC 
> -XX:InitiatingHeapOccupancyPercent=35 -XX:ConcGCThreads=12
>   
> *DEPENDENCIES*
>  * Operation system: Ubuntu 16.04.3 LTS
>  * Java: jdk1.8.0_131 (tested also with jdk1.8.0_221)
>  * Python: Python 2.7.12
>  
> *NOTE:* In Spark 2.1.1 the driver memory consumption (Storage Memory tab) was 
> extremely low and after the run of ContextCleaner and BlockManager the memory 
> was decreasing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-29055) Memory leak in Spark

Reply via email to