As of today, Cloud Dataflow will also be executed as a GBK. On Thu, Jul 21, 2016 at 2:56 AM, Aljoscha Krettek <[email protected]> wrote:
> +1 > > Out of curiosity, does Cloud Dataflow have a CoGBK primitive or will it > also be executed as a GBK there? > > On Thu, 21 Jul 2016 at 02:29 Kam Kasravi <[email protected]> wrote: > > > +1 - awesome Manu. > > > > On Wednesday, July 20, 2016 1:53 PM, Kenneth Knowles > > <[email protected]> wrote: > > > > > > +1 > > > > I assume that the intent is for the semantics of both GBK and CoGBK to be > > unchanged, just swapping their status as primitives. > > > > This seems like a good change, with strictly positive impact on users and > > SDK authors, with only an extremely minor burden (doing an insertion of > the > > provided implementation in the worst case) on runner authors. > > > > Kenn > > > > > > On Wed, Jul 20, 2016 at 10:38 AM, Lukasz Cwik <[email protected]> > > wrote: > > > > > I would like to propose a change to Beam to make CoGBK the basis for > > > grouping instead of GBK. The idea behind this proposal is that CoGBK > is a > > > more powerful operator then GBK allowing for two key benefits: > > > > > > 1) SDKs are simplified: transforming a CoGBK into a GBK is trivial > while > > > the reverse is not. > > > 2) It will be easier for runners to provide more efficient > > implementations > > > of CoGBK as they will be responsible for the logic which takes their > own > > > internal grouping implementation and maps it onto a CoGBK. > > > > > > This requires the following modifications to the Beam code base: > > > > > > 1) Make GBK a composite transform in terms of CoGBK. > > > 2) Move the CoGBK from contrib to runners-core as an adapter*. Runners > > that > > > more naturally support GBK can just use this and everything executes > > > exactly as before. > > > > > > *just like GroupByKeyViaGroupByKeyOnly and > UnboundedReadFromBoundedSource > > > > > > > > > >
