[
https://issues.apache.org/jira/browse/SPARK-33418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
dingbei updated SPARK-33418:
----------------------------
Attachment: (was: 1.png)
> TaskSchedulerImpl: Check pending tasks in advance when resource offers
> ----------------------------------------------------------------------
>
> Key: SPARK-33418
> URL: https://issues.apache.org/jira/browse/SPARK-33418
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 3.0.1
> Reporter: dingbei
> Priority: Major
> Attachments: 2.png, 3.jpg, 4.jpg, 5.png, 6.jpg
>
>
> It begins with the needs to start a lot of spark streaming receivers . *The
> launch time gets super long when it comes to more than 300 receivers.* I will
> show tests data I did and how I improved this.
> *Tests preparation*
> There are two cores exists in every executors.(one for receiver and the other
> one to process every batch of datas). I observed launch time of all receivers
> through spark web UI (Total Uptime when the last receiver started).
> *Tests and data*
> At first, we set the number of executors to 200 which means to start 200
> receivers and everything goes well. It takes about 50s to launch all
> receivers.({color:#ff0000}pic 1{color})
> Then we set the number of executors to 500 which means to start 500
> receivers. The launch time became around 5 mins.({color:#ff0000}pic 2{color})
> *Dig into souce code*
> Then I start to look for the reason in the source code. I use Thread dump to
> check which methods takes relatively long time.({color:#ff0000}pic 3{color})
> Then I type logs between these methods. At last I find that the loop in
> {color:#00875a}TaskSchedulerImpl.resourceOffers{color} will executes more
> than 600000.({color:#ff0000}pic 4{color})
> *Explaination and Solution*
> The loop in TaskSchedulerImpl.resourceOffers will iterate all none-zombie
> TaskSetManagers in a queue of Pool. Normally the size of this queue is not so
> big because it gets removed when all of its tasks is done. But for spark
> streaming jobs, we all konw receivers will be wrapped as a non-stop job
> ,which means its TaskSetManager will exists in the queue all the time until
> the application is finished. For example, when it start to launch the 10th
> receiver ,the size of the queue is 10 ,so it will iterates 10 times and when
> it starts to launch the 500th receiver, it will iterate 500 times . However
> 499 of the iteration are not necessay ,their task is already on running .
> When I digged deep into the code. I find that it decides whether a
> TaskSetManagers still has pending tasks left in
> {color:#00875a}TaskSetManagers .dequeueTaskFromList{color}({color:#ff0000}pic
> 5{color}) which is far away form the loop in
> {color:#00875a}TaskSchedulerImpl.resourceOffers{color}. So I move the pending
> tasks code ahead to the loop in
> {color:#00875a}TaskSchedulerImpl.resourceOffers{color}.({color:#ff0000}pic
> 6{color}) ,and I also consided the speculation mode.
> *conclusion*
> I think the spark contributors haven't thought a scenario where a lot of job
> are running at the same time which I know is unusual but still a good
> complement。We managed to reduce the launch time of all receivers to around
> 50s stablely (500 receivers).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]