Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

Sean Owen Mon, 04 Mar 2019 07:24:40 -0800

To be clear, those goals sound fine to me. I don't think voting on
those two broad points is meaningful, but, does no harm per se. If you
mean this is just a check to see if people believe this is broadly
worthwhile, then +1 from me. Yes it is.


That means we'd want to review something more detailed later, whether
it's a a) design doc we vote on or b) a series of pull requests. Given
the number of questions this leaves open, a) sounds better and I think
what you're suggesting. I'd call that the SPIP, but, so what, it's
just a name. The thing is, a) seems already mostly done, in the second
document that was attached. I'm hesitating because i'm not sure why
it's important to not discuss that level of detail here, as it's
already available. Just too much noise? but voting for this seems like
endorsing those decisions, as I can only assume the proposer is going
to continue the design with those decisions in mind.

What's the next step in your view, after this, and before it's
implemented? as long as there is one, sure, let's punt. Seems like we
could begin that conversation nowish.

Many of those questions you list are _fine_ for a SPIP, in my opinion.
(Of course, I'd add what cluster managers are in/out of scope.)


On Mon, Mar 4, 2019 at 9:07 AM Xiangrui Meng <men...@gmail.com> wrote:
>
> What finer "high level" goals do you recommend? To make progress on the vote, 
> it would be great if you can articulate more. Current SPIP proposes two 
> high-level changes to make Spark accelerator-aware:
>
> At cluster manager level, we update or upgrade cluster managers to include 
> GPU support. Then we expose user interfaces for Spark to request GPUs from 
> them.
> Within Spark, we update its scheduler to understand available GPUs allocated 
> to executors, user task requests, and assign GPUs to tasks properly.
>
> How do you want to change or refine them? I saw you raised questions around 
> Horovod requirements and GPU/memory allocation. But there are tens of 
> questions at the same or even higher level. E.g., in preparing the companion 
> scoping doc we saw the following questions:
>
> * How to test GPU support on Jenkins?
> * Does the solution proposed also work for FPGA? What are the diffs?
> * How to make standalone workers auto-discover GPU resources?
> * Do we want to allow users to request GPU resources in Pandas UDF?
> * How does user pass the GPU requests to K8s, spark-submit command-line or 
> pod template?
> * Do we create a separate queue for GPU task scheduling so it doesn't cause 
> regression on normal jobs?
> * How to monitor the utilization of GPU? At what levels?
> * Do we want to support GPU-backed physical operators?
> * Do we allow users to request both non-default number of CPUs and GPUs?
> * ...
>
> IMHO, we cannot nor we should answer questions at this level in this vote. 
> The vote is majorly on whether we should make Spark accelerator-aware to help 
> unify big data and AI solutions, specifically whether Spark should provide 
> proper support to deep learning model training and inference where 
> accelerators are essential. My +1 vote is based on the following logic:
>
> * It is important for Spark to become the de facto solution in connecting big 
> data and AI.
> * The work is doable given the design sketch and the early 
> investigation/scoping.
>
> To me, "-1" means either it is not important for Spark to support such use 
> cases or we certainly cannot afford to implement such support. This is my 
> understanding of the SPIP and the vote. It would be great if you can 
> elaborate what changes you want to make or what answers you want to see.
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

Reply via email to