[
https://issues.apache.org/jira/browse/SPARK-33418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
dingbei updated SPARK-33418:
----------------------------
Description:
It begins with the needs to start a lot of spark streaming receivers . The
launch time gets super long when it comes to more than 300 receivers. I will
show Tests data I did and how did I improve this. There are two cores in every
executors.(one for receiver and the other one to process bacth jobs)
There is two main metrics i will mention below.
receiver launch time :From the first receiver started to the last one.(observed
through spark web UI)
At first, we set the number of executors to 200 which means to start 200
receivers and everything goes well. launch time is around 50s.processing time
for every batch is around 10s.(picture 1)
Then we set the number of executors to 500 which means to start 500 receivers.
The launch time became around 5 mins. processing time for every batch is around
2mins.(picture 2)
Then I start to look for the reason in the source code. I use Thread dump to
check which methods takes relatively long time.
was:
It begins with the needs to start a lot of spark streaming receivers . The
launch time gets super long when it comes to more than 300 receivers. I will
show Tests data I did and how did I improve this. There are two cores in every
executors.(one for receiver and the other one to process bacth jobs),and there
will be a batch of data every 10s.
There is two main metrics i will mention below.
receiver launch time :From the first receiver started to the last one.(observed
through spark web UI)
batch data processing time: the time it takes to process a batch of
data.(observed through spark web UI streaming)
At first, we set the number of executors to 200 which means to start 200
receivers and everything goes well. launch time is around 50s.processing time
for every batch is around 10s.
Then we set the number of executors to 500 which means to start 500 receivers
> TaskSchedulerImpl: Check pending tasks in advance when resource offers
> ----------------------------------------------------------------------
>
> Key: SPARK-33418
> URL: https://issues.apache.org/jira/browse/SPARK-33418
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 3.0.1
> Reporter: dingbei
> Priority: Major
>
> It begins with the needs to start a lot of spark streaming receivers . The
> launch time gets super long when it comes to more than 300 receivers. I will
> show Tests data I did and how did I improve this. There are two cores in
> every executors.(one for receiver and the other one to process bacth jobs)
> There is two main metrics i will mention below.
> receiver launch time :From the first receiver started to the last
> one.(observed through spark web UI)
>
> At first, we set the number of executors to 200 which means to start 200
> receivers and everything goes well. launch time is around 50s.processing time
> for every batch is around 10s.(picture 1)
> Then we set the number of executors to 500 which means to start 500
> receivers. The launch time became around 5 mins. processing time for every
> batch is around 2mins.(picture 2)
>
> Then I start to look for the reason in the source code. I use Thread dump to
> check which methods takes relatively long time.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]