[ 
https://issues.apache.org/jira/browse/SPARK-20352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15970906#comment-15970906
 ] 

hosein commented on SPARK-20352:
--------------------------------

I monitor execution time of every line in my code and this line:

spark =   
SparkSession.builder.appName("sync_task").config('spark.driver.maxResultSize' , 
'5g').getOrCreate()

take too long (20 or more seconds) if my code runs for hours.

> PySpark SparkSession initialization take longer every iteration in a single 
> application
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-20352
>                 URL: https://issues.apache.org/jira/browse/SPARK-20352
>             Project: Spark
>          Issue Type: Question
>          Components: PySpark
>    Affects Versions: 2.1.0
>         Environment: Ubuntu 12
> Spark 2.1
> JRE 8.0
> Python 2.7
>            Reporter: hosein
>
> I run Spark on a standalone Ubuntu server with 128G memory and 32-core CPU. 
> Run spark-sumbit my_code.py without any additional configuration parameters.
> In a while loop I start SparkSession, analyze data and then stop the context 
> and this process repeats every 10 seconds.
> {code}
> while True:
>     spark =   
> SparkSession.builder.appName("sync_task").config('spark.driver.maxResultSize' 
> , '5g').getOrCreate()
>     sc = spark.sparkContext
>     #some process and analyze
>     spark.stop()
> {code}
> When program starts, it works perfectly.
> but when it works for many hours. spark initialization take long time. it 
> makes 10 or 20 seconds for just initializing spark.
> So what is the problem ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to