Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

Yinan Li Mon, 17 Jun 2019 19:15:46 -0700

+1 (non-binding)

On Mon, Jun 17, 2019 at 1:58 PM Ryan Blue <[email protected]> wrote:


> +1 (non-binding)
>
> On Sun, Jun 16, 2019 at 11:11 PM Dongjoon Hyun <[email protected]>
> wrote:
>
>> +1
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Sun, Jun 16, 2019 at 9:41 PM Saisai Shao <[email protected]>
>> wrote:
>>
>>> +1 (binding)
>>>
>>> Thanks
>>> Saisai
>>>
>>> Imran Rashid <[email protected]> 于2019年6月15日周六 上午3:46写道：
>>>
>>>> +1 (binding)
>>>>
>>>> I think this is a really important feature for spark.
>>>>
>>>> First, there is already a lot of interest in alternative shuffle
>>>> storage in the community.  There is already a lot of interest in
>>>> alternative shuffle storage, from dynamic allocation in kubernetes, to even
>>>> just improving stability in standard on-premise use of Spark.  However,
>>>> they're often stuck doing this in forks of Spark, and in ways that are not
>>>> maintainable (because they copy-paste many spark internals) or are
>>>> incorrect (for not correctly handling speculative execution & stage
>>>> retries).
>>>>
>>>> Second, I think the specific proposal is good for finding the right
>>>> balance between flexibility and too much complexity, to allow incremental
>>>> improvements.  A lot of work has been put into this already to try to
>>>> figure out which pieces are essential to make alternative shuffle storage
>>>> implementations feasible.
>>>>
>>>> Of course, that means it doesn't include everything imaginable; some
>>>> things still aren't supported, and some will still choose to use the older
>>>> ShuffleManager api to give total control over all of shuffle.  But we know
>>>> there are a reasonable set of things which can be implemented behind the
>>>> api as the first step, and it can continue to evolve.
>>>>
>>>> On Fri, Jun 14, 2019 at 12:13 PM Ilan Filonenko <[email protected]>
>>>> wrote:
>>>>
>>>>> +1 (non-binding). This API is versatile and flexible enough to handle
>>>>> Bloomberg's internal use-cases. The ability for us to vary implementation
>>>>> strategies is quite appealing. It is also worth to note the minimal 
>>>>> changes
>>>>> to Spark core in order to make it work. This is a very much needed 
>>>>> addition
>>>>> within the Spark shuffle story.
>>>>>
>>>>> On Fri, Jun 14, 2019 at 9:59 AM bo yang <[email protected]> wrote:
>>>>>
>>>>>> +1 This is great work, allowing plugin of different sort shuffle
>>>>>> write/read implementation! Also great to see it retain the current Spark
>>>>>> configuration
>>>>>> (spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl).
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I would like to call a vote for the SPIP for SPARK-25299
>>>>>>> <https://issues.apache.org/jira/browse/SPARK-25299>, which proposes
>>>>>>> to introduce a pluggable storage API for temporary shuffle data.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> You may find the SPIP document here
>>>>>>> <https://docs.google.com/document/d/1d6egnL6WHOwWZe8MWv3m8n4PToNacdx7n_0iMSWwhCQ/edit>
>>>>>>> .
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The discussion thread for the SPIP was conducted here
>>>>>>> <https://lists.apache.org/thread.html/2fe82b6b86daadb1d2edaef66a2d1c4dd2f45449656098ee38c50079@%3Cdev.spark.apache.org%3E>
>>>>>>> .
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Please vote on whether or not this proposal is agreeable to you.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -Matt Cheah
>>>>>>>
>>>>>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

Reply via email to