+1 (non-binding) On Mon, Jun 17, 2019 at 1:58 PM Ryan Blue <rb...@netflix.com.invalid> wrote:
> +1 (non-binding) > > On Sun, Jun 16, 2019 at 11:11 PM Dongjoon Hyun <dongjoon.h...@gmail.com> > wrote: > >> +1 >> >> Bests, >> Dongjoon. >> >> >> On Sun, Jun 16, 2019 at 9:41 PM Saisai Shao <sai.sai.s...@gmail.com> >> wrote: >> >>> +1 (binding) >>> >>> Thanks >>> Saisai >>> >>> Imran Rashid <im...@therashids.com> 于2019年6月15日周六 上午3:46写道: >>> >>>> +1 (binding) >>>> >>>> I think this is a really important feature for spark. >>>> >>>> First, there is already a lot of interest in alternative shuffle >>>> storage in the community. There is already a lot of interest in >>>> alternative shuffle storage, from dynamic allocation in kubernetes, to even >>>> just improving stability in standard on-premise use of Spark. However, >>>> they're often stuck doing this in forks of Spark, and in ways that are not >>>> maintainable (because they copy-paste many spark internals) or are >>>> incorrect (for not correctly handling speculative execution & stage >>>> retries). >>>> >>>> Second, I think the specific proposal is good for finding the right >>>> balance between flexibility and too much complexity, to allow incremental >>>> improvements. A lot of work has been put into this already to try to >>>> figure out which pieces are essential to make alternative shuffle storage >>>> implementations feasible. >>>> >>>> Of course, that means it doesn't include everything imaginable; some >>>> things still aren't supported, and some will still choose to use the older >>>> ShuffleManager api to give total control over all of shuffle. But we know >>>> there are a reasonable set of things which can be implemented behind the >>>> api as the first step, and it can continue to evolve. >>>> >>>> On Fri, Jun 14, 2019 at 12:13 PM Ilan Filonenko <i...@cornell.edu> >>>> wrote: >>>> >>>>> +1 (non-binding). This API is versatile and flexible enough to handle >>>>> Bloomberg's internal use-cases. The ability for us to vary implementation >>>>> strategies is quite appealing. It is also worth to note the minimal >>>>> changes >>>>> to Spark core in order to make it work. This is a very much needed >>>>> addition >>>>> within the Spark shuffle story. >>>>> >>>>> On Fri, Jun 14, 2019 at 9:59 AM bo yang <bobyan...@gmail.com> wrote: >>>>> >>>>>> +1 This is great work, allowing plugin of different sort shuffle >>>>>> write/read implementation! Also great to see it retain the current Spark >>>>>> configuration >>>>>> (spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl). >>>>>> >>>>>> >>>>>> On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah <mch...@palantir.com> >>>>>> wrote: >>>>>> >>>>>>> Hi everyone, >>>>>>> >>>>>>> >>>>>>> >>>>>>> I would like to call a vote for the SPIP for SPARK-25299 >>>>>>> <https://issues.apache.org/jira/browse/SPARK-25299>, which proposes >>>>>>> to introduce a pluggable storage API for temporary shuffle data. >>>>>>> >>>>>>> >>>>>>> >>>>>>> You may find the SPIP document here >>>>>>> <https://docs.google.com/document/d/1d6egnL6WHOwWZe8MWv3m8n4PToNacdx7n_0iMSWwhCQ/edit> >>>>>>> . >>>>>>> >>>>>>> >>>>>>> >>>>>>> The discussion thread for the SPIP was conducted here >>>>>>> <https://lists.apache.org/thread.html/2fe82b6b86daadb1d2edaef66a2d1c4dd2f45449656098ee38c50079@%3Cdev.spark.apache.org%3E> >>>>>>> . >>>>>>> >>>>>>> >>>>>>> >>>>>>> Please vote on whether or not this proposal is agreeable to you. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> >>>>>>> >>>>>>> -Matt Cheah >>>>>>> >>>>>> > > -- > Ryan Blue > Software Engineer > Netflix >