I'll try to find some time, but it's really at a premium right now. On Mon, Mar 4, 2019 at 3:17 PM Xiangrui Meng <men...@gmail.com> wrote:
> > > On Mon, Mar 4, 2019 at 3:10 PM Mark Hamstra <m...@clearstorydata.com> > wrote: > >> :) Sorry, that was ambiguous. I was seconding Imran's comment. >> > > Could you also help review Xingbo's design sketch and help evaluate the > cost? > > >> >> On Mon, Mar 4, 2019 at 3:09 PM Xiangrui Meng <men...@gmail.com> wrote: >> >>> >>> >>> On Mon, Mar 4, 2019 at 1:56 PM Mark Hamstra <m...@clearstorydata.com> >>> wrote: >>> >>>> +1 >>>> >>> >>> Mark, just to be clear, are you +1 on the SPIP or Imran's point? >>> >>> >>>> >>>> On Mon, Mar 4, 2019 at 12:52 PM Imran Rashid <im...@therashids.com> >>>> wrote: >>>> >>>>> On Sun, Mar 3, 2019 at 6:51 PM Xiangrui Meng <men...@gmail.com> wrote: >>>>> >>>>>> On Sun, Mar 3, 2019 at 10:20 AM Felix Cheung < >>>>>> felixcheun...@hotmail.com> wrote: >>>>>> >>>>>>> IMO upfront allocation is less useful. Specifically too expensive >>>>>>> for large jobs. >>>>>>> >>>>>> >>>>>> This is also an API/design discussion. >>>>>> >>>>> >>>>> I agree with Felix -- this is more than just an API question. It has >>>>> a huge impact on the complexity of what you're proposing. You might be >>>>> proposing big changes to a core and brittle part of spark, which is >>>>> already >>>>> short of experts. >>>>> >>>> >>> To my understanding, Felix's comment is mostly on the user interfaces, >>> stating upfront allocation is less useful, specially for large jobs. I >>> agree that for large jobs we better have dynamic allocation, which was >>> mentioned in the YARN support section in the companion scoping doc. We >>> restrict the new container type to initially requested to keep things >>> simple. However upfront allocation already meets the requirements of basic >>> workflows like data + DL training/inference + data. Saying "it is less >>> useful specifically for large jobs" kinda missed the fact that "it is super >>> useful for basic use cases". >>> >>> Your comment is mostly on the implementation side, which IMHO it is the >>> KEY question to conclude this vote: does the design sketch sufficiently >>> demonstrate that the internal changes to Spark scheduler is manageable? I >>> read Xingbo's design sketch and I think it is doable, which led to my +1. >>> But I'm not an expert on the scheduler. So I would feel more confident if >>> the design was reviewed by some scheduler experts. I also read the design >>> sketch to support different cluster managers, which I think is less >>> critical than the internal scheduler changes. >>> >>> >>>> >>>>> I don't see any value in having a vote on "does feature X sound cool?" >>>>> >>>> >>> I believe no one would disagree. To prepare the companion doc, we went >>> through several rounds of discussions to provide concrete stories such that >>> the proposal is not just "cool". >>> >>> >>>> >>>>> >>>> We have to evaluate the potential benefit against the risks the feature >>>>> brings and the continued maintenance cost. We don't need super low-level >>>>> details, but we have to a sketch of the design to be able to make that >>>>> tradeoff. >>>>> >>>> >>> Could you review the design sketch from Xingbo, help evaluate the cost, >>> and provide feedback? >>> >>> >>