Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

Xiangrui Meng Sun, 03 Mar 2019 16:51:55 -0800

Hi Felix,

Just to clarify, we are voting on the SPIP, not the companion scoping doc.
What is proposed and what we are voting on is to make Spark
accelerator-aware. The companion scoping doc and the design sketch are to
help demonstrate that what features could be implemented based on the use
cases and dev resources the co-authors are aware of. The exact scoping and
design would require more community involvement, by no means we are
finalizing it in this vote thread.


I think copying the goals and non-goals from the companion scoping doc to
the SPIP caused the confusion. As mentioned in the SPIP, we proposed to
make two major changes at high level:

   - At cluster manager level, we update or upgrade cluster managers to
   include GPU support. Then we expose user interfaces for Spark to request
   GPUs from them.
   - Within Spark, we update its scheduler to understand available GPUs
   allocated to executors, user task requests, and assign GPUs to tasks
   properly.

We should keep our vote discussion at this level. It doesn't exclude
Mesos/Windows/TPU/FPGA, nor it commits to support YARN/K8s. Through the
initial scoping work, we found that we certainly need domain experts to
discuss the support of each cluster manager and each accelerator type. But
adding more details on Mesos or FPGA doesn't change the SPIP at high level.
So we concluded the initial scoping, shared the docs, and started this vote.

I suggest updating the goals and non-goals in the SPIP so we don't turn the
vote into discussing a specific cluster manager support or non-support.
After we reach a high-level agreement, the work can be fairly distributed.
If there are both strong demand and dev resources from the community for a
specific cluster manager or an accelerator type, I don't see why we should
block the work. If the work requires more discussion, we can start a new
SPIP thread.

Also see my inline comments below.

On Sun, Mar 3, 2019 at 10:20 AM Felix Cheung <felixcheun...@hotmail.com>
wrote:

> Great points Sean.
>
> Here’s what I’d like to suggest to move forward.
> Split the SPIP.
>
> If we want to propose upfront homogeneous allocation (aka
> spark.task.gpus), this should be one on its own and for instance,
>

This is more like an API/design discussion, which can be done after the
vote. I don't think the feature alone needs a separate SPIP thread. On the
high level, spark users should be able to request and use GPUs properly.
How to implement is pending the design.


> I really agree with Sean (like I did in the discuss thread) that we can’t
> simply non-goal Mesos. We have enough maintenance issue as it is. And IIRC
> there was a PR proposed for K8S that I’d like to see bring that discussion
> here as well.
>

+1. As I mentioned above, discussing each cluster manager support requires
domain experts. The goals and non-goals in the SPIP caused this confusion.
I suggest updating the goals and non-goals and then having separate
discussion for each that doesn't block the main SPIP vote. It would be
great if you or Sean can lead the discussion on Mesos support.


>
> IMO upfront allocation is less useful. Specifically too expensive for
> large jobs.
>

This is also an API/design discussion.


>
> If we want per-stage resource request, this should a full SPIP with a lot
> more details to be hashed out. Our work with Horovod brings a few specific
> and critical requirements on how this should work with distributed DL and I
> would like to see those addressed.
>

SPIP is designed to not have a lot details. I agree with what Reynold said
on the Table Metadata thread:

"""
In general it'd be better to have the SPIPs be higher level, and put the
detailed APIs in a separate doc. Alternatively, put them in the SPIP but
explicitly vote on the high level stuff and not the detailed APIs.
"""

Could you create a JIRA and document the list of requirements from Horovod
use cases?


>
> In any case I’d like to see more consensus before moving forward, until
> then I’m going to -1 this.
>
>
>
> ------------------------------
> *From:* Sean Owen <sro...@gmail.com>
> *Sent:* Sunday, March 3, 2019 8:15 AM
> *To:* Felix Cheung
> *Cc:* Xingbo Jiang; Yinan Li; dev; Weichen Xu; Marco Gaido
> *Subject:* Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling
>
> I'm for this in general, at least a +0. I do think this has to have a
> story for what to do with the existing Mesos GPU support, which sounds
> entirely like the spark.task.gpus config here. Maybe it's just a
> synonym? that kind of thing.
>
> Requesting different types of GPUs might be a bridge too far, but,
> that's a P2 detail that can be hashed out later. (For example, if a
> v100 is available and k80 was requested, do you use it or fail? is the
> right level of resource control GPU RAM and cores?)
>
> The per-stage resource requirements sounds like the biggest change;
> you can even change CPU cores requested per pandas UDF? and what about
> memory then? We'll see how that shakes out. That's the only thing I'm
> kind of unsure about in this proposal.
>
> On Sat, Mar 2, 2019 at 9:35 PM Felix Cheung <felixcheun...@hotmail.com>
> wrote:
> >
> > I’m very hesitant with this.
> >
> > I don’t want to vote -1, because I personally think it’s important to
> do, but I’d like to see more discussion points addressed and not voting
> completely on the spirit of it.
> >
> > First, SPIP doesn’t match the format of SPIP proposed and agreed on.
> (Maybe this is a minor point and perhaps we should also vote to update the
> SPIP format)
> >
> > Second, there are multiple pdf/google doc and JIRA. And I think for
> example the design sketch is not covering the same points as the updated
> SPIP doc? It would help to make them align before moving forward.
> >
> > Third, the proposal touches on some fairly core and sensitive
> components, like the scheduler, and I think more discussions are necessary.
> We have a few comments there and in the JIRA.
> >
> >
> >
> > ________________________________
> > From: Marco Gaido <marcogaid...@gmail.com>
> > Sent: Saturday, March 2, 2019 4:18 AM
> > To: Weichen Xu
> > Cc: Yinan Li; Tom Graves; dev; Xingbo Jiang
> > Subject: Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling
> >
> > +1, a critical feature for AI/DL!
> >
> > Il giorno sab 2 mar 2019 alle ore 05:14 Weichen Xu <
> weichen...@databricks.com> ha scritto:
> >>
> >> +1, nice feature!
> >>
> >> On Sat, Mar 2, 2019 at 6:11 AM Yinan Li <liyinan...@gmail.com> wrote:
> >>>
> >>> +1
> >>>
> >>> On Fri, Mar 1, 2019 at 12:37 PM Tom Graves
> <tgraves...@yahoo.com.invalid> wrote:
> >>>>
> >>>> +1 for the SPIP.
> >>>>
> >>>> Tom
> >>>>
> >>>> On Friday, March 1, 2019, 8:14:43 AM CST, Xingbo Jiang <
> jiangxb1...@gmail.com> wrote:
> >>>>
> >>>>
> >>>> Hi all,
> >>>>
> >>>> I want to call for a vote of SPARK-24615. It improves Spark by making
> it aware of GPUs exposed by cluster managers, and hence Spark can match GPU
> resources with user task requests properly. The proposal and production doc
> was made available on dev@ to collect input. Your can also find a design
> sketch at SPARK-27005.
> >>>>
> >>>> The vote will be up for the next 72 hours. Please reply with your
> vote:
> >>>>
> >>>> +1: Yeah, let's go forward and implement the SPIP.
> >>>> +0: Don't really care.
> >>>> -1: I don't think this is a good idea because of the following
> technical reasons.
> >>>>
> >>>> Thank you!
> >>>>
> >>>> Xingbo
>

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

Reply via email to