As of today, Cloud Dataflow will also be executed as a GBK.

On Thu, Jul 21, 2016 at 2:56 AM, Aljoscha Krettek <[email protected]>
wrote:

> +1
>
> Out of curiosity, does Cloud Dataflow have a CoGBK primitive or will it
> also be executed as a GBK there?
>
> On Thu, 21 Jul 2016 at 02:29 Kam Kasravi <[email protected]> wrote:
>
> > +1 - awesome Manu.
> >
> >     On Wednesday, July 20, 2016 1:53 PM, Kenneth Knowles
> > <[email protected]> wrote:
> >
> >
> >  +1
> >
> > I assume that the intent is for the semantics of both GBK and CoGBK to be
> > unchanged, just swapping their status as primitives.
> >
> > This seems like a good change, with strictly positive impact on users and
> > SDK authors, with only an extremely minor burden (doing an insertion of
> the
> > provided implementation in the worst case) on runner authors.
> >
> > Kenn
> >
> >
> > On Wed, Jul 20, 2016 at 10:38 AM, Lukasz Cwik <[email protected]>
> > wrote:
> >
> > > I would like to propose a change to Beam to make CoGBK the basis for
> > > grouping instead of GBK. The idea behind this proposal is that CoGBK
> is a
> > > more powerful operator then GBK allowing for two key benefits:
> > >
> > > 1) SDKs are simplified: transforming a CoGBK into a GBK is trivial
> while
> > > the reverse is not.
> > > 2) It will be easier for runners to provide more efficient
> > implementations
> > > of CoGBK as they will be responsible for the logic which takes their
> own
> > > internal grouping implementation and maps it onto a CoGBK.
> > >
> > > This requires the following modifications to the Beam code base:
> > >
> > > 1) Make GBK a composite transform in terms of CoGBK.
> > > 2) Move the CoGBK from contrib to runners-core as an adapter*. Runners
> > that
> > > more naturally support GBK can just use this and everything executes
> > > exactly as before.
> > >
> > > *just like GroupByKeyViaGroupByKeyOnly and
> UnboundedReadFromBoundedSource
> > >
> >
> >
> >
>

Reply via email to