Seems making sense to me.

Would be great to have some feedback from people such as @Wenchen Fan
<wenc...@databricks.com> @Cheng Su <chen...@fb.com> @angers zhu
<angers....@gmail.com>.


On Tue, 26 Oct 2021 at 17:25, Dongjoon Hyun <dongjoon.h...@gmail.com> wrote:

> +1 for this SPIP.
>
> On Sun, Oct 24, 2021 at 9:59 AM huaxin gao <huaxin.ga...@gmail.com> wrote:
>
>> +1. Thanks for lifting the current restrictions on bucket join and making
>> this more generalized.
>>
>> On Sun, Oct 24, 2021 at 9:33 AM Ryan Blue <b...@apache.org> wrote:
>>
>>> +1 from me as well. Thanks Chao for doing so much to get it to this
>>> point!
>>>
>>> On Sat, Oct 23, 2021 at 11:29 PM DB Tsai <dbt...@dbtsai.com> wrote:
>>>
>>>> +1 on this SPIP.
>>>>
>>>> This is a more generalized version of bucketed tables and bucketed
>>>> joins which can eliminate very expensive data shuffles when joins, and
>>>> many users in the Apache Spark community have wanted this feature for
>>>> a long time!
>>>>
>>>> Thank you, Ryan and Chao, for working on this, and I look forward to
>>>> it as a new feature in Spark 3.3
>>>>
>>>> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>>>>
>>>> On Fri, Oct 22, 2021 at 12:18 PM Chao Sun <sunc...@apache.org> wrote:
>>>> >
>>>> > Hi,
>>>> >
>>>> > Ryan and I drafted a design doc to support a new type of join:
>>>> storage partitioned join which covers bucket join support for DataSourceV2
>>>> but is more general. The goal is to let Spark leverage distribution
>>>> properties reported by data sources and eliminate shuffle whenever 
>>>> possible.
>>>> >
>>>> > Design doc:
>>>> https://docs.google.com/document/d/1foTkDSM91VxKgkEcBMsuAvEjNybjja-uHk-r3vtXWFE
>>>> (includes a POC link at the end)
>>>> >
>>>> > We'd like to start a discussion on the doc and any feedback is
>>>> welcome!
>>>> >
>>>> > Thanks,
>>>> > Chao
>>>>
>>>
>>>
>>> --
>>> Ryan Blue
>>>
>>

Reply via email to