+1 Nicely done! On Tue, Oct 26, 2021 at 8:08 AM Chao Sun <[email protected]> wrote:
> Oops, sorry. I just fixed the permission setting. > > Thanks everyone for the positive support! > > On Tue, Oct 26, 2021 at 7:30 AM Wenchen Fan <[email protected]> wrote: > >> +1 to this SPIP and nice writeup of the design doc! >> >> Can we open comment permission in the doc so that we can discuss details >> there? >> >> On Tue, Oct 26, 2021 at 8:29 PM Hyukjin Kwon <[email protected]> wrote: >> >>> Seems making sense to me. >>> >>> Would be great to have some feedback from people such as @Wenchen Fan >>> <[email protected]> @Cheng Su <[email protected]> @angers zhu >>> <[email protected]>. >>> >>> >>> On Tue, 26 Oct 2021 at 17:25, Dongjoon Hyun <[email protected]> >>> wrote: >>> >>>> +1 for this SPIP. >>>> >>>> On Sun, Oct 24, 2021 at 9:59 AM huaxin gao <[email protected]> >>>> wrote: >>>> >>>>> +1. Thanks for lifting the current restrictions on bucket join and >>>>> making this more generalized. >>>>> >>>>> On Sun, Oct 24, 2021 at 9:33 AM Ryan Blue <[email protected]> wrote: >>>>> >>>>>> +1 from me as well. Thanks Chao for doing so much to get it to this >>>>>> point! >>>>>> >>>>>> On Sat, Oct 23, 2021 at 11:29 PM DB Tsai <[email protected]> wrote: >>>>>> >>>>>>> +1 on this SPIP. >>>>>>> >>>>>>> This is a more generalized version of bucketed tables and bucketed >>>>>>> joins which can eliminate very expensive data shuffles when joins, >>>>>>> and >>>>>>> many users in the Apache Spark community have wanted this feature for >>>>>>> a long time! >>>>>>> >>>>>>> Thank you, Ryan and Chao, for working on this, and I look forward to >>>>>>> it as a new feature in Spark 3.3 >>>>>>> >>>>>>> DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 >>>>>>> >>>>>>> On Fri, Oct 22, 2021 at 12:18 PM Chao Sun <[email protected]> >>>>>>> wrote: >>>>>>> > >>>>>>> > Hi, >>>>>>> > >>>>>>> > Ryan and I drafted a design doc to support a new type of join: >>>>>>> storage partitioned join which covers bucket join support for >>>>>>> DataSourceV2 >>>>>>> but is more general. The goal is to let Spark leverage distribution >>>>>>> properties reported by data sources and eliminate shuffle whenever >>>>>>> possible. >>>>>>> > >>>>>>> > Design doc: >>>>>>> https://docs.google.com/document/d/1foTkDSM91VxKgkEcBMsuAvEjNybjja-uHk-r3vtXWFE >>>>>>> (includes a POC link at the end) >>>>>>> > >>>>>>> > We'd like to start a discussion on the doc and any feedback is >>>>>>> welcome! >>>>>>> > >>>>>>> > Thanks, >>>>>>> > Chao >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Ryan Blue >>>>>> >>>>> -- John Zhuge
