[ https://issues.apache.org/jira/browse/SPARK-26439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-26439: ---------------------------------- Affects Version/s: (was: 2.4.0) 3.0.0 > Introduce WorkerOffer reservation mechanism for Barrier TaskSet > --------------------------------------------------------------- > > Key: SPARK-26439 > URL: https://issues.apache.org/jira/browse/SPARK-26439 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.0.0 > Reporter: wuyi > Priority: Major > Labels: performance > > Currently, Barrier TaskSet has a hard requirement that tasks can only be > launched > in a single resourceOffers round with enough slots(or sufficient resources), > but > can not be guaranteed even if with enough slots due to task locality delay > scheduling. > So, it is very likely that Barrier TaskSet gets a chunk of sufficient > resources after > all the trouble, but let it go easily just beacuae one of pending tasks can > not be > scheduled. Futhermore, it causes severe resource competition between > TaskSets and jobs > and introduce unclear semantic for DynamicAllocation. > This JIRA trys to introduce WorkerOffer reservation mechanism for Barrier > TaskSet, which > allows Barrier TaskSet to reserve WorkerOffer in each resourceOffers round, > and launch > tasks at the same time once it accumulate the sufficient resource. In this > way, we > relax the requirement of resources for the Barrier TaskSet. To avoid the > deadlock which > may be introuduced by serveral Barrier TaskSets holding the reserved > WorkerOffers for a > long time, we'll ask Barrier TaskSets to force releasing part of reserved > WorkerOffers > on demand. So, it is highly possible that each Barrier TaskSet would be > launched in the > end. > To integrate with DynamicAllocation > The possible effective way I can imagine is that adding new event, e.g. > ExecutorReservedEvent, ExecutorReleasedEvent, which behaved like busy > executor with > running tasks or idle executor without running tasks. Thus, > ExecutionAllocationManager > would not let the executor go if it reminds of there're some reserved > resource on that > executor. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org