hosein created SPARK-20352:
------------------------------
Summary: PySpark SparkSession initialization take longer every
iteration in a single application
Key: SPARK-20352
URL: https://issues.apache.org/jira/browse/SPARK-20352
Project: Spark
Issue Type: Question
Components: PySpark
Affects Versions: 2.1.0
Environment: linux ubunto 12
pyspark
Reporter: hosein
Fix For: 2.1.0
I run Spark on a standalone Ubuntu server with 128G memory and 32-core CPU. Run
spark-sumbit my_code.py without any additional configuration parameters.
In a while loop I start SparkSession, analyze data and then stop the context
and this process repeats every 10 seconds.
#####################
while True:
>>>>spark =
>>>>SparkSession.builder.appName("sync_task").config('spark.driver.maxResultSize'
>>>> , '5g').getOrCreate()
>>>>sc = spark.sparkContext
>>>>#some process and analyze
>>>>spark.stop()
#######################
When program starts, it works perfectly.
but when it works for many hours. spark initialization take long time. it makes
10 or 20 seconds for just initializing spark.
So what is the problem ?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]