hosein created SPARK-20352:
------------------------------

             Summary: PySpark SparkSession initialization take longer every 
iteration in a single application
                 Key: SPARK-20352
                 URL: https://issues.apache.org/jira/browse/SPARK-20352
             Project: Spark
          Issue Type: Question
          Components: PySpark
    Affects Versions: 2.1.0
         Environment: linux ubunto 12
pyspark
            Reporter: hosein
             Fix For: 2.1.0


I run Spark on a standalone Ubuntu server with 128G memory and 32-core CPU. Run 
spark-sumbit my_code.py without any additional configuration parameters.
In a while loop I start SparkSession, analyze data and then stop the context 
and this process repeats every 10 seconds.

#####################
while True:
>>>>spark =   
>>>>SparkSession.builder.appName("sync_task").config('spark.driver.maxResultSize'
>>>> , '5g').getOrCreate()
>>>>sc = spark.sparkContext
>>>>#some process and analyze
>>>>spark.stop()
#######################

When program starts, it works perfectly.

but when it works for many hours. spark initialization take long time. it makes 
10 or 20 seconds for just initializing spark.

So what is the problem ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to