Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

Felix Cheung Sun, 03 Mar 2019 10:21:12 -0800

Great points Sean.

Here’s what I’d like to suggest to move forward.
Split the SPIP.

If we want to propose upfront homogeneous allocation (aka spark.task.gpus), 
this should be one on its own and for instance, I really agree with Sean (like 
I did in the discuss thread) that we can’t simply non-goal Mesos. We have 
enough maintenance issue as it is. And IIRC there was a PR proposed for K8S 
that I’d like to see bring that discussion here as well.

IMO upfront allocation is less useful. Specifically too expensive for large 
jobs.

If we want per-stage resource request, this should a full SPIP with a lot more 
details to be hashed out. Our work with Horovod brings a few specific and 
critical requirements on how this should work with distributed DL and I would 
like to see those addressed.

In any case I’d like to see more consensus before moving forward, until then 
I’m going to -1 this.

________________________________
From: Sean Owen <sro...@gmail.com>
Sent: Sunday, March 3, 2019 8:15 AM
To: Felix Cheung
Cc: Xingbo Jiang; Yinan Li; dev; Weichen Xu; Marco Gaido
Subject: Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

I'm for this in general, at least a +0. I do think this has to have a
story for what to do with the existing Mesos GPU support, which sounds
entirely like the spark.task.gpus config here. Maybe it's just a
synonym? that kind of thing.

Requesting different types of GPUs might be a bridge too far, but,
that's a P2 detail that can be hashed out later. (For example, if a
v100 is available and k80 was requested, do you use it or fail? is the
right level of resource control GPU RAM and cores?)

The per-stage resource requirements sounds like the biggest change;
you can even change CPU cores requested per pandas UDF? and what about
memory then? We'll see how that shakes out. That's the only thing I'm
kind of unsure about in this proposal.

On Sat, Mar 2, 2019 at 9:35 PM Felix Cheung <felixcheun...@hotmail.com> wrote:
>
> I’m very hesitant with this.
>
> I don’t want to vote -1, because I personally think it’s important to do, but 
> I’d like to see more discussion points addressed and not voting completely on 
> the spirit of it.
>
> First, SPIP doesn’t match the format of SPIP proposed and agreed on. (Maybe 
> this is a minor point and perhaps we should also vote to update the SPIP 
> format)
>
> Second, there are multiple pdf/google doc and JIRA. And I think for example 
> the design sketch is not covering the same points as the updated SPIP doc? It 
> would help to make them align before moving forward.
>
> Third, the proposal touches on some fairly core and sensitive components, 
> like the scheduler, and I think more discussions are necessary. We have a few 
> comments there and in the JIRA.
>
>
>
> ________________________________
> From: Marco Gaido <marcogaid...@gmail.com>
> Sent: Saturday, March 2, 2019 4:18 AM
> To: Weichen Xu
> Cc: Yinan Li; Tom Graves; dev; Xingbo Jiang
> Subject: Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling
>
> +1, a critical feature for AI/DL!
>
> Il giorno sab 2 mar 2019 alle ore 05:14 Weichen Xu 
> <weichen...@databricks.com> ha scritto:
>>
>> +1, nice feature!
>>
>> On Sat, Mar 2, 2019 at 6:11 AM Yinan Li <liyinan...@gmail.com> wrote:
>>>
>>> +1
>>>
>>> On Fri, Mar 1, 2019 at 12:37 PM Tom Graves <tgraves...@yahoo.com.invalid> 
>>> wrote:
>>>>
>>>> +1 for the SPIP.
>>>>
>>>> Tom
>>>>
>>>> On Friday, March 1, 2019, 8:14:43 AM CST, Xingbo Jiang 
>>>> <jiangxb1...@gmail.com> wrote:
>>>>
>>>>
>>>> Hi all,
>>>>
>>>> I want to call for a vote of SPARK-24615. It improves Spark by making it 
>>>> aware of GPUs exposed by cluster managers, and hence Spark can match GPU 
>>>> resources with user task requests properly. The proposal and production 
>>>> doc was made available on dev@ to collect input. Your can also find a 
>>>> design sketch at SPARK-27005.
>>>>
>>>> The vote will be up for the next 72 hours. Please reply with your vote:
>>>>
>>>> +1: Yeah, let's go forward and implement the SPIP.
>>>> +0: Don't really care.
>>>> -1: I don't think this is a good idea because of the following technical 
>>>> reasons.
>>>>
>>>> Thank you!
>>>>
>>>> Xingbo

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

Reply via email to