[ 
https://issues.apache.org/jira/browse/SPARK-20352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hosein updated SPARK-20352:
---------------------------
    Description: 
I run Spark on a standalone Ubuntu server with 128G memory and 32-core CPU. Run 
spark-sumbit my_code.py without any additional configuration parameters.
In a while loop I start SparkSession, analyze data and then stop the context 
and this process repeats every 10 seconds.

{code}
while True:
    spark =   
SparkSession.builder.appName("sync_task").config('spark.driver.maxResultSize' , 
'5g').getOrCreate()
    sc = spark.sparkContext
    #some process and analyze
    spark.stop()
{code}

When program starts, it works perfectly.

but when it works for many hours. spark initialization take long time. it makes 
10 or 20 seconds for just initializing spark.

So what is the problem ?

  was:
I run Spark on a standalone Ubuntu server with 128G memory and 32-core CPU. Run 
spark-sumbit my_code.py without any additional configuration parameters.
In a while loop I start SparkSession, analyze data and then stop the context 
and this process repeats every 10 seconds.

#####################
while True:
>>>>spark =   
>>>>SparkSession.builder.appName("sync_task").config('spark.driver.maxResultSize'
>>>> , '5g').getOrCreate()
>>>>sc = spark.sparkContext
>>>>#some process and analyze
>>>>spark.stop()
#######################

When program starts, it works perfectly.

but when it works for many hours. spark initialization take long time. it makes 
10 or 20 seconds for just initializing spark.

So what is the problem ?


> PySpark SparkSession initialization take longer every iteration in a single 
> application
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-20352
>                 URL: https://issues.apache.org/jira/browse/SPARK-20352
>             Project: Spark
>          Issue Type: Question
>          Components: PySpark
>    Affects Versions: 2.1.0
>         Environment: Ubuntu 12
> Spark 2.1
> JRE 8.0
> Python 2.7
>            Reporter: hosein
>             Fix For: 2.1.0
>
>
> I run Spark on a standalone Ubuntu server with 128G memory and 32-core CPU. 
> Run spark-sumbit my_code.py without any additional configuration parameters.
> In a while loop I start SparkSession, analyze data and then stop the context 
> and this process repeats every 10 seconds.
> {code}
> while True:
>     spark =   
> SparkSession.builder.appName("sync_task").config('spark.driver.maxResultSize' 
> , '5g').getOrCreate()
>     sc = spark.sparkContext
>     #some process and analyze
>     spark.stop()
> {code}
> When program starts, it works perfectly.
> but when it works for many hours. spark initialization take long time. it 
> makes 10 or 20 seconds for just initializing spark.
> So what is the problem ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to