Re: [DISCUSS] KIP-150 - Kafka-Streams Cogroup

Xavier Léauté Wed, 14 Jun 2017 17:11:38 -0700

+1 from me

any stream should be able to initialize the cogroup


On Wed, Jun 14, 2017 at 3:44 PM Kyle Winkelman <[email protected]>
wrote:

> I will update the kip to have only the aggregator in the first cogroup call
> and the initializer and serde in the aggregate calls.
>
> This seems to be the consensus on the email chain.
>
> Thanks,
> Kyle
>
> On Jun 14, 2017 5:41 PM, wrote:
>
> That is not the case. No matter which stream the record comes in on the
> initializer is called if there is not currently an object in the store.
>
> On Jun 14, 2017 5:12 PM, "Guozhang Wang" <[email protected]> wrote:
>
> While regarding where we should ask users to set serdes: I think I'm
> convinced by Xavier that they should be in the `aggregate` call (but only
> those does not pass in a state store supplier) instead of the
> `KStream#cogroup` call to be consistent with other aggregate functions.
>
> BTW another motivation for me to suggest keeping the initializer on the
> first stream is that by reviewing the PR (some time ago though, so again I
> might be wrong) we will trigger the initializer only when we received an
> incoming record from the first stream whose key is not in the state store
> yet, while for other streams we will just drop it on the floor. If that is
> actually not the case, that we call initializer on any one of the
> co-grouped streams' incoming records, then I'm open to set the initializer
> at the `aggregate` call as well.
>
>
> Guozhang
>
> On Wed, Jun 14, 2017 at 2:23 PM, Guozhang Wang <[email protected]> wrote:
>
> > I'd suggest we do not block this KIP until the serde work has been sorted
> > out: we cannot estimate yet how long it will take yet. Instead let's say
> > make an agreement on where we want to specify the serdes: whether on the
> > first co-group call or on the aggregate call.
> >
> > Also about the initializer specification I actually felt that the first
> > cogrouped stream is special (Kyle please feel free to correct me if I'm
> > wrong) and that is why I thought it is better to specify the initializer
> at
> > the beginning: since from the typing you can see that the final
> aggregated
> > value type is defined to be the same as the first co-grouped stream, and
> > for any intermediate stream to co-group, their value types not be
> inherited
> > but the value be "incorporated" into the original stream:
> >
> >  <T> CogroupedKStream<K, V> cogroup(final KGroupedStream<K, T>
> > groupedStream, final Aggregator<? super K, ? super T, V> aggregator)
> >
> > Note that we do not have a cogroup function that returns
> > CogroupedKStream<K, T>.
> >
> >
> > Guozhang
> >
> >
> > On Tue, Jun 13, 2017 at 2:31 PM, Bill Bejeck <[email protected]> wrote:
> >
> >> +1 on deferring discussion on Serdes until API improvements are ironed
> >> out.
> >>
> >> On Tue, Jun 13, 2017 at 2:06 PM, Matthias J. Sax <[email protected]
> >
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > I am just catching up on this thread. (1) as most people agree, we
> >> > should not add anything to KStreamBuilder (btw: we actually plan to
> move
> >> > #merge() to KStream and deprecate it on KStreamBuilder as it's a quite
> >> > unnatural API atm).
> >> >
> >> > About specifying Serdes: there is still the idea to improve to overall
> >> > API from the current "we are adding more overloads"-pattern to a
> >> > builder-like pattern. This might make the whole discussion void if we
> do
> >> > this. Thus, it might make sense to keep this in mind (or even delay
> this
> >> > KIP?). It seems a waste of time to discuss all this if we are going to
> >> > chance it in 2 month anyway... Just saying.
> >> >
> >> >
> >> > -Matthias
> >> >
> >> > On 6/13/17 8:05 AM, Michal Borowiecki wrote:
> >> > > You're right, I haven't thought of that.
> >> > >
> >> > > Cheers,
> >> > >
> >> > > Michał
> >> > >
> >> > >
> >> > > On 13/06/17 13:00, Kyle Winkelman wrote:
> >> > >> First, I would prefer not calling it aggregate because there are
> >> already
> >> > >> plenty of aggregate methods.
> >> > >>
> >> > >> Second, I dont think this would really work because after each
> >> aggregate
> >> > >> you now have a unique KTable (someone may want a table with 4
> streams
> >> > and
> >> > >> reuse those 4 in another table but with one more stream added) and
> >> > unless
> >> > >> we completely duplicate everything every time this isnt really
> >> possible.
> >> > >> Additionally, the cogroup way just requires 1 more call to create
> two
> >> > >> different tables (normal, windowed, and session windowed) this new
> >> way
> >> > >> would require copying the aggregate chain.
> >> > >>
> >> > >> Another way to think about it is with cogroup we know that when
> they
> >> > call
> >> > >> aggregate they arent going to be adding any more aggregators to
> that
> >> > table
> >> > >> but your way requires us to assume they are done adding aggregators
> >> > after
> >> > >> each call so we must return a ktable just to possibly not need to
> >> have
> >> > >> created it.
> >> > >>
> >> > >> On Jun 13, 2017 5:20 AM, "Michal Borowiecki" <
> >> > [email protected]>
> >> > >> wrote:
> >> > >>
> >> > >>> Actually, just had a thought. It started with naming.
> >> > >>>
> >> > >>> Are we actually co-grouping these streams or are we co-aggregating
> >> > them?
> >> > >>>
> >> > >>> After all, in each of the cogroup calls we are providing an
> >> Aggregator
> >> > >>> implementation.
> >> > >>>
> >> > >>>
> >> > >>> If they are really co-aggregated, why don't we turn this around:
> >> > >>> KGroupedStream<K, V1> grouped1 = builder.stream("topic1").
> >> > groupByKey();
> >> > >>> KGroupedStream<K, V2> grouped2 = builder.stream("topic2").
> >> > groupByKey();
> >> > >>> KGroupedStream<K, V3> grouped3 = builder.stream("topic3").
> >> > groupByKey();
> >> > >>>
> >> > >>> KTable<K, CG> coagg = grouped1.aggregate(initializer1,
> aggregator1,
> >> > >>> aggValueSerde1) // this is the unchanged aggregate method
> >> > >>>         .aggregate(grouped2, aggregator2)  // this is a new method
> >> > >>>         .aggregate(grouped3, aggregator3); // ditto
> >> > >>>
> >> > >>> This means instead of adding cogroup methods on KGroupStream
> >> interface,
> >> > >>> adding aggregate method on KTable interface.
> >> > >>>
> >> > >>> Is that feasible?
> >> > >>>
> >> > >>> Cheers,
> >> > >>> Michał
> >> > >>>
> >> > >>> On 13/06/17 10:56, Michal Borowiecki wrote:
> >> > >>>
> >> > >>> Also, I still feel that putting initializer on the first cogroup
> can
> >> > >>> mislead users into thinking the first stream is in some way
> special.
> >> > >>> Just my 5c.
> >> > >>> Michał
> >> > >>>
> >> > >>> On 13/06/17 09:54, Michal Borowiecki wrote:
> >> > >>>
> >> > >>> Agree completely with the argument for serdes belonging in the
> same
> >> > place
> >> > >>> as the state store name, which is in the aggregate method.
> >> > >>>
> >> > >>> Cheers,
> >> > >>>
> >> > >>> Michał
> >> > >>>
> >> > >>> On 12/06/17 18:20, Xavier Léauté wrote:
> >> > >>>
> >> > >>> I think we are discussing two separate things here, so it might be
> >> > worth
> >> > >>> clarifying:
> >> > >>>
> >> > >>> 1) the position of the initializer with respect to the
> aggregators.
> >> If
> >> > I
> >> > >>> understand correctly, Guozhang seems to think it is more natural
> to
> >> > specify
> >> > >>> the initializer first, despite it not bearing any relation to the
> >> first
> >> > >>> aggregator. I can see the argument for specifying the initializer
> >> > first,
> >> > >>> but I think it is debatable whether mixing it into the first
> cogroup
> >> > call
> >> > >>> leads to a cleaner API or not.
> >> > >>>
> >> > >>> 2) where the serde should be defined (if necessary). Looking at
> our
> >> > >>> existing APIs in KGroupedStreams, we always offer two aggregate()
> >> > >>> methods. The first one takes the name of the store and associated
> >> > aggregate
> >> > >>> value serde e.g. KGroupedStream.aggregate(Initializer<VR>
> >> initializer,
> >> > >>> Aggregator<? super K, ? super V, VR> aggregator, Serde<VR>
> >> > aggValueSerde,
> >> > >>> String queryableStoreName)
> >> > >>> The second one only takes a state store supplier, and does not
> >> specify
> >> > any
> >> > >>> serde, e.g. KGroupedStream.aggregate(Initializer<VR>
> >> > >>> initializer, Aggregator<? super K, ? super V, VR> aggregator,
> final
> >> > >>> StateStoreSupplier<KeyValueStore> storeSupplier)
> >> > >>> Presumably, when specifying a state store supplier it shouldn't be
> >> > >>> necessary to specify an aggregate value serde, since the provided
> >> > >>> statestore might not need to serialize the values (e.g. it may
> just
> >> > keep
> >> > >>> them as regular objects in heap) or it may have its own
> >> > >>> internal serialization format.
> >> > >>>
> >> > >>> For consistency I think it would be valuable to preserve the same
> >> two
> >> > >>> aggregate methods for cogroup as well. Since the serde is only
> >> > required in
> >> > >>> one of the two cases, I believe the serde has no place in the
> first
> >> > >>> cogroup() call and should only have to be specified as part of the
> >> > >>> aggregate() method that takes a state store name. In the case of a
> >> > state
> >> > >>> store supplier, no serde would be necessary.
> >> > >>>
> >> > >>>
> >> > >>> On Sat, Jun 10, 2017 at 4:09 PM Guozhang Wang <[email protected]
> >
> >> > wrote:
> >> > >>>
> >> > >>>> I'd agree that the aggregate value serde and the initializer does
> >> not
> >> > >>>> bear direct relationship with the first `cogroup` calls, but
> after
> >> I
> >> > tried
> >> > >>>> to write some example code with these two different set of APIs I
> >> > felt the
> >> > >>>> current APIs just program more naturally.
> >> > >>>>
> >> > >>>> I know it is kinda subjective, but I do think that user
> experience
> >> > may be
> >> > >>>> more important as a deciding factor than the logical argument for
> >> > public
> >> > >>>> interfaces. So I'd recommend people to also try out writing some
> >> > example
> >> > >>>> lines also and we can circle back and discuss which one feels
> more
> >> > natural
> >> > >>>> to write code.
> >> > >>>>
> >> > >>>>
> >> > >>>> Guozhang
> >> > >>>>
> >> > >>>> On Fri, Jun 9, 2017 at 1:59 AM, Michal Borowiecki <
> >> > >>>> [email protected]> wrote:
> >> > >>>>
> >> > >>>>> I feel it would make more sense to move the initializer and
> serde
> >> to
> >> > the
> >> > >>>>> final aggregate statement, since the serde only applies to the
> >> state
> >> > >>>>> store,
> >> > >>>>> and the initializer doesn't bear any relation to the first group
> >> in
> >> > >>>>> particular.
> >> > >>>>>
> >> > >>>>> +1 for moving initializer and serde from cogroup() to the
> >> aggregate()
> >> > >>>>> for the reasons mentioned above.
> >> > >>>>>
> >> > >>>>> Cheers,
> >> > >>>>>
> >> > >>>>> Michał
> >> > >>>>>
> >> > >>>>> On 08/06/17 22:44, Guozhang Wang wrote:
> >> > >>>>>
> >> > >>>> Note that although the internal `AbstractStoreSupplier` does
> >> maintain
> >> > the
> >> > >>>>> key-value serdes, we do not enforce the interface of
> >> > `StateStoreSupplier`
> >> > >>>>> to always retain that information, and hence we cannot assume
> that
> >> > >>>>> StateStoreSuppliers always retain key / value serdes.
> >> > >>>>>
> >> > >>>>> On Thu, Jun 8, 2017 at 11:51 AM, Xavier Léauté <
> >> [email protected]>
> >> > <[email protected]> wrote:
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> Another reason for the serde not to be in the first cogroup
> call,
> >> is
> >> > that
> >> > >>>>> the serde should not be required if you pass a
> StateStoreSupplier
> >> to
> >> > >>>>> aggregate()
> >> > >>>>>
> >> > >>>>> Regarding the aggregated type <T> I don't the why initializer
> >> should
> >> > be
> >> > >>>>> favored over aggregator to define the type. In my mind
> separating
> >> the
> >> > >>>>> initializer into the last aggregate call clearly indicates that
> >> the
> >> > >>>>> initializer is independent of any of the aggregators or streams
> >> and
> >> > that we
> >> > >>>>> don't wait for grouped1 events to initialize the co-group.
> >> > >>>>>
> >> > >>>>> On Thu, Jun 8, 2017 at 11:14 AM Guozhang Wang <
> [email protected]
> >
> >> <
> >> > [email protected]> wrote:
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> On a second thought... This is the current proposal API
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> ```
> >> > >>>>>
> >> > >>>>> <T> CogroupedKStream<K, T> cogroup(final Initializer<T>
> >> initializer,
> >> > >>>>>
> >> > >>>>> final
> >> > >>>>>
> >> > >>>>> Aggregator<? super K, ? super V, T> aggregator, final Serde<T>
> >> > >>>>> aggValueSerde)
> >> > >>>>>
> >> > >>>>> ```
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> If we do not have the initializer in the first co-group it might
> >> be
> >> > a bit
> >> > >>>>> awkward for users to specify the aggregator that returns a typed
> >> <T>
> >> > >>>>>
> >> > >>>>> value?
> >> > >>>>>
> >> > >>>>> Maybe it is still better to put these two functions in the same
> >> api?
> >> > >>>>>
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> Guozhang
> >> > >>>>>
> >> > >>>>> On Thu, Jun 8, 2017 at 11:08 AM, Guozhang Wang <
> >> [email protected]>
> >> > <[email protected]>
> >> > >>>>>
> >> > >>>>> wrote:
> >> > >>>>>
> >> > >>>>> This suggestion lgtm. I would vote for the first alternative
> than
> >> > >>>>>
> >> > >>>>> adding
> >> > >>>>>
> >> > >>>>> it to the `KStreamBuilder` though.
> >> > >>>>>
> >> > >>>>> On Thu, Jun 8, 2017 at 10:58 AM, Xavier Léauté <
> >> [email protected]>
> >> > <[email protected]>
> >> > >>>>> wrote:
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> I have a minor suggestion to make the API a little bit more
> >> > symmetric.
> >> > >>>>> I feel it would make more sense to move the initializer and
> serde
> >> to
> >> > >>>>>
> >> > >>>>> the
> >> > >>>>>
> >> > >>>>> final aggregate statement, since the serde only applies to the
> >> state
> >> > >>>>> store,
> >> > >>>>> and the initializer doesn't bear any relation to the first group
> >> in
> >> > >>>>> particular. It would end up looking like this:
> >> > >>>>>
> >> > >>>>> KTable<K, CG> cogrouped =
> >> > >>>>>     grouped1.cogroup(aggregator1)
> >> > >>>>>             .cogroup(grouped2, aggregator2)
> >> > >>>>>             .cogroup(grouped3, aggregator3)
> >> > >>>>>             .aggregate(initializer1, aggValueSerde, storeName1);
> >> > >>>>>
> >> > >>>>> Alternatively, we could move the the first cogroup() method to
> >> > >>>>> KStreamBuilder, similar to how we have .merge()
> >> > >>>>> and end up with an api that would be even more symmetric.
> >> > >>>>>
> >> > >>>>> KStreamBuilder.cogroup(grouped1, aggregator1)
> >> > >>>>>               .cogroup(grouped2, aggregator2)
> >> > >>>>>               .cogroup(grouped3, aggregator3)
> >> > >>>>>               .aggregate(initializer1, aggValueSerde,
> storeName1);
> >> > >>>>>
> >> > >>>>> This doesn't have to be a blocker, but I thought it would make
> the
> >> > API
> >> > >>>>> just
> >> > >>>>> a tad cleaner.
> >> > >>>>>
> >> > >>>>> On Tue, Jun 6, 2017 at 3:59 PM Guozhang Wang <
> [email protected]>
> >> <
> >> > [email protected]>
> >> > >>>>>
> >> > >>>>> wrote:
> >> > >>>>>
> >> > >>>>> Kyle,
> >> > >>>>>
> >> > >>>>> Thanks a lot for the updated KIP. It looks good to me.
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> Guozhang
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> On Fri, Jun 2, 2017 at 5:37 AM, Jim Jagielski <[email protected]>
> <
> >> > [email protected]>
> >> > >>>>>
> >> > >>>>> wrote:
> >> > >>>>>
> >> > >>>>> This makes much more sense to me. +1
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> On Jun 1, 2017, at 10:33 AM, Kyle Winkelman <
> >> > >>>>>
> >> > >>>>> [email protected]>
> >> > >>>>>
> >> > >>>>> wrote:
> >> > >>>>>
> >> > >>>>> I have updated the KIP and my PR. Let me know what you think.
> >> > >>>>> To created a cogrouped stream just call cogroup on a
> >> > >>>>>
> >> > >>>>> KgroupedStream
> >> > >>>>>
> >> > >>>>> and
> >> > >>>>>
> >> > >>>>> supply the initializer, aggValueSerde, and an aggregator. Then
> >> > >>>>>
> >> > >>>>> continue
> >> > >>>>>
> >> > >>>>> adding kgroupedstreams and aggregators. Then call one of the
> >> > >>>>>
> >> > >>>>> many
> >> > >>>>>
> >> > >>>>> aggregate
> >> > >>>>>
> >> > >>>>> calls to create a KTable.
> >> > >>>>>
> >> > >>>>> Thanks,
> >> > >>>>> Kyle
> >> > >>>>>
> >> > >>>>> On Jun 1, 2017 4:03 AM, "Damian Guy" <[email protected]> <
> >> > [email protected]>
> >> > >>>>>
> >> > >>>>> wrote:
> >> > >>>>>
> >> > >>>>> Hi Kyle,
> >> > >>>>>
> >> > >>>>> Thanks for the update. I think just one initializer makes sense
> >> > >>>>>
> >> > >>>>> as
> >> > >>>>>
> >> > >>>>> it
> >> > >>>>>
> >> > >>>>> should only be called once per key and generally it is just
> >> > >>>>>
> >> > >>>>> going
> >> > >>>>>
> >> > >>>>> to
> >> > >>>>>
> >> > >>>>> create
> >> > >>>>>
> >> > >>>>> a new instance of whatever the Aggregate class is.
> >> > >>>>>
> >> > >>>>> Cheers,
> >> > >>>>> Damian
> >> > >>>>>
> >> > >>>>> On Wed, 31 May 2017 at 20:09 Kyle Winkelman <
> >> > >>>>>
> >> > >>>>> [email protected]
> >> > >>>>>
> >> > >>>>> wrote:
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> Hello all,
> >> > >>>>>
> >> > >>>>> I have spent some more time on this and the best alternative I
> >> > >>>>>
> >> > >>>>> have
> >> > >>>>>
> >> > >>>>> come
> >> > >>>>>
> >> > >>>>> up
> >> > >>>>>
> >> > >>>>> with is:
> >> > >>>>> KGroupedStream has a single cogroup call that takes an
> >> > >>>>>
> >> > >>>>> initializer
> >> > >>>>>
> >> > >>>>> and
> >> > >>>>>
> >> > >>>>> an
> >> > >>>>>
> >> > >>>>> aggregator.
> >> > >>>>> CogroupedKStream has a cogroup call that takes additional
> >> > >>>>>
> >> > >>>>> groupedStream
> >> > >>>>>
> >> > >>>>> aggregator pairs.
> >> > >>>>> CogroupedKStream has multiple aggregate methods that create
> >> > >>>>>
> >> > >>>>> the
> >> > >>>>>
> >> > >>>>> different
> >> > >>>>>
> >> > >>>>> stores.
> >> > >>>>>
> >> > >>>>> I plan on updating the kip but I want people's input on if we
> >> > >>>>>
> >> > >>>>> should
> >> > >>>>>
> >> > >>>>> have
> >> > >>>>>
> >> > >>>>> the initializer be passed in once at the beginning or if we
> >> > >>>>>
> >> > >>>>> should
> >> > >>>>>
> >> > >>>>> instead
> >> > >>>>>
> >> > >>>>> have the initializer be required for each call to one of the
> >> > >>>>>
> >> > >>>>> aggregate
> >> > >>>>>
> >> > >>>>> calls. The first makes more sense to me but doesnt allow the
> >> > >>>>>
> >> > >>>>> user
> >> > >>>>>
> >> > >>>>> to
> >> > >>>>>
> >> > >>>>> specify different initializers for different tables.
> >> > >>>>>
> >> > >>>>> Thanks,
> >> > >>>>> Kyle
> >> > >>>>>
> >> > >>>>> On May 24, 2017 7:46 PM, "Kyle Winkelman" <
> >> > >>>>>
> >> > >>>>> [email protected]>
> >> > >>>>>
> >> > >>>>> wrote:
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> Yea I really like that idea I'll see what I can do to update
> >> > >>>>>
> >> > >>>>> the
> >> > >>>>>
> >> > >>>>> kip
> >> > >>>>>
> >> > >>>>> and
> >> > >>>>>
> >> > >>>>> my pr when I have some time. I'm not sure how well creating
> >> > >>>>>
> >> > >>>>> the
> >> > >>>>>
> >> > >>>>> kstreamaggregates will go though because at that point I will
> >> > >>>>>
> >> > >>>>> have
> >> > >>>>>
> >> > >>>>> thrown
> >> > >>>>>
> >> > >>>>> away the type of the values. It will be type safe I just may
> >> > >>>>>
> >> > >>>>> need to
> >> > >>>>>
> >> > >>>>> do a
> >> > >>>>>
> >> > >>>>> little forcing.
> >> > >>>>>
> >> > >>>>> Thanks,
> >> > >>>>> Kyle
> >> > >>>>>
> >> > >>>>> On May 24, 2017 3:28 PM, "Guozhang Wang" <[email protected]
> >> > >>>>>
> >> > >>>>> wrote:
> >> > >>>>>
> >> > >>>>> Kyle,
> >> > >>>>>
> >> > >>>>> Thanks for the explanations, my previous read on the wiki
> >> > >>>>>
> >> > >>>>> examples
> >> > >>>>>
> >> > >>>>> was
> >> > >>>>>
> >> > >>>>> wrong.
> >> > >>>>>
> >> > >>>>> So I guess my motivation should be "reduced" to: can we move
> >> > >>>>>
> >> > >>>>> the
> >> > >>>>>
> >> > >>>>> window
> >> > >>>>>
> >> > >>>>> specs param from "KGroupedStream#cogroup(..)" to
> >> > >>>>> "CogroupedKStream#aggregate(..)", and my motivations are:
> >> > >>>>>
> >> > >>>>> 1. minor: we can reduce the #.generics in CogroupedKStream
> >> > >>>>>
> >> > >>>>> from
> >> > >>>>>
> >> > >>>>> 3
> >> > >>>>>
> >> > >>>>> to
> >> > >>>>>
> >> > >>>>> 2.
> >> > >>>>>
> >> > >>>>> 2. major: this is for extensibility of the APIs, and since
> >> > >>>>>
> >> > >>>>> we
> >> > >>>>>
> >> > >>>>> are
> >> > >>>>>
> >> > >>>>> removing
> >> > >>>>>
> >> > >>>>> the "Evolving" annotations on Streams it may be harder to
> >> > >>>>>
> >> > >>>>> change it
> >> > >>>>>
> >> > >>>>> again
> >> > >>>>>
> >> > >>>>> in the future. The extended use cases are that people wanted
> >> > >>>>>
> >> > >>>>> to
> >> > >>>>>
> >> > >>>>> have
> >> > >>>>>
> >> > >>>>> windowed running aggregates on different granularities, e.g.
> >> > >>>>>
> >> > >>>>> "give
> >> > >>>>>
> >> > >>>>> me
> >> > >>>>>
> >> > >>>>> the
> >> > >>>>>
> >> > >>>>> counts per-minute, per-hour, per-day and per-week", and
> >> > >>>>>
> >> > >>>>> today
> >> > >>>>>
> >> > >>>>> in
> >> > >>>>>
> >> > >>>>> DSL
> >> > >>>>>
> >> > >>>>> we
> >> > >>>>>
> >> > >>>>> need to specify that case in multiple aggregate operators,
> >> > >>>>>
> >> > >>>>> which
> >> > >>>>>
> >> > >>>>> gets
> >> > >>>>>
> >> > >>>>> a
> >> > >>>>>
> >> > >>>>> state store / changelog, etc. And it is possible to optimize
> >> > >>>>>
> >> > >>>>> it
> >> > >>>>>
> >> > >>>>> as
> >> > >>>>>
> >> > >>>>> well
> >> > >>>>>
> >> > >>>>> to
> >> > >>>>>
> >> > >>>>> a single state store. Its implementation would be tricky as
> >> > >>>>>
> >> > >>>>> you
> >> > >>>>>
> >> > >>>>> need
> >> > >>>>>
> >> > >>>>> to
> >> > >>>>>
> >> > >>>>> contain different lengthed windows within your window store
> >> > >>>>>
> >> > >>>>> but
> >> > >>>>>
> >> > >>>>> just
> >> > >>>>>
> >> > >>>>> from
> >> > >>>>>
> >> > >>>>> the public API point of view, it could be specified as:
> >> > >>>>>
> >> > >>>>> CogroupedKStream stream = stream1.cogroup(stream2, ...
> >> > >>>>> "state-store-name");
> >> > >>>>>
> >> > >>>>> table1 = stream.aggregate(/*per-minute window*/)
> >> > >>>>> table2 = stream.aggregate(/*per-hour window*/)
> >> > >>>>> table3 = stream.aggregate(/*per-day window*/)
> >> > >>>>>
> >> > >>>>> while underlying we are only using a single store
> >> > >>>>>
> >> > >>>>> "state-store-name"
> >> > >>>>>
> >> > >>>>> for
> >> > >>>>>
> >> > >>>>> it.
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> Although this feature is out of the scope of this KIP, I'd
> >> > >>>>>
> >> > >>>>> like
> >> > >>>>>
> >> > >>>>> to
> >> > >>>>>
> >> > >>>>> discuss
> >> > >>>>>
> >> > >>>>> if we can "leave the door open" to make such changes without
> >> > >>>>>
> >> > >>>>> modifying
> >> > >>>>>
> >> > >>>>> the
> >> > >>>>>
> >> > >>>>> public APIs .
> >> > >>>>>
> >> > >>>>> Guozhang
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> On Wed, May 24, 2017 at 3:57 AM, Kyle Winkelman <
> >> > >>>>>
> >> > >>>>> [email protected]
> >> > >>>>>
> >> > >>>>> wrote:
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> I allow defining a single window/sessionwindow one time
> >> > >>>>>
> >> > >>>>> when
> >> > >>>>>
> >> > >>>>> you
> >> > >>>>>
> >> > >>>>> make
> >> > >>>>>
> >> > >>>>> the
> >> > >>>>>
> >> > >>>>> cogroup call from a KGroupedStream. From then on you are
> >> > >>>>>
> >> > >>>>> using
> >> > >>>>>
> >> > >>>>> the
> >> > >>>>>
> >> > >>>>> cogroup
> >> > >>>>>
> >> > >>>>> call from with in CogroupedKStream which doesnt accept any
> >> > >>>>>
> >> > >>>>> additional
> >> > >>>>>
> >> > >>>>> windows/sessionwindows.
> >> > >>>>>
> >> > >>>>> Is this what you meant by your question or did I
> >> > >>>>>
> >> > >>>>> misunderstand?
> >> > >>>>>
> >> > >>>>> On May 23, 2017 9:33 PM, "Guozhang Wang" <
> >> > >>>>>
> >> > >>>>> [email protected]
> >> > >>>>>
> >> > >>>>> wrote:
> >> > >>>>>
> >> > >>>>> Another question that came to me is on "window alignment":
> >> > >>>>>
> >> > >>>>> from
> >> > >>>>>
> >> > >>>>> the
> >> > >>>>>
> >> > >>>>> KIP
> >> > >>>>>
> >> > >>>>> it
> >> > >>>>>
> >> > >>>>> seems you are allowing users to specify a (potentially
> >> > >>>>>
> >> > >>>>> different)
> >> > >>>>>
> >> > >>>>> window
> >> > >>>>>
> >> > >>>>> spec in each co-grouped input stream. So if these window
> >> > >>>>>
> >> > >>>>> specs
> >> > >>>>>
> >> > >>>>> are
> >> > >>>>>
> >> > >>>>> different how should we "align" them with different input
> >> > >>>>>
> >> > >>>>> streams? I
> >> > >>>>>
> >> > >>>>> think
> >> > >>>>>
> >> > >>>>> it is more natural to only specify on window spec in the
> >> > >>>>>
> >> > >>>>> KTable<RK, V> CogroupedKStream#aggregate(Windows);
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> And remove it from the cogroup() functions. WDYT?
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> Guozhang
> >> > >>>>>
> >> > >>>>> On Tue, May 23, 2017 at 6:22 PM, Guozhang Wang <
> >> > >>>>>
> >> > >>>>> [email protected]>
> >> > >>>>>
> >> > >>>>> wrote:
> >> > >>>>>
> >> > >>>>> Thanks for the proposal Kyle, this is a quite common use
> >> > >>>>>
> >> > >>>>> case
> >> > >>>>>
> >> > >>>>> to
> >> > >>>>>
> >> > >>>>> support
> >> > >>>>>
> >> > >>>>> such multi-way table join (i.e. N source tables with N
> >> > >>>>>
> >> > >>>>> aggregate
> >> > >>>>>
> >> > >>>>> func)
> >> > >>>>>
> >> > >>>>> with
> >> > >>>>>
> >> > >>>>> a single store and N+1 serdes, I have seen lots of people
> >> > >>>>>
> >> > >>>>> using
> >> > >>>>>
> >> > >>>>> the
> >> > >>>>>
> >> > >>>>> low-level PAPI to achieve this goal.
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> On Fri, May 19, 2017 at 10:04 AM, Kyle Winkelman <
> >> > >>>>>
> >> > >>>>> [email protected]
> >> > >>>>>
> >> > >>>>> wrote:
> >> > >>>>>
> >> > >>>>> I like your point about not handling other cases such as
> >> > >>>>>
> >> > >>>>> count
> >> > >>>>>
> >> > >>>>> and
> >> > >>>>>
> >> > >>>>> reduce.
> >> > >>>>>
> >> > >>>>> I think that reduce may not make sense because reduce
> >> > >>>>>
> >> > >>>>> assumes
> >> > >>>>>
> >> > >>>>> that
> >> > >>>>>
> >> > >>>>> the
> >> > >>>>>
> >> > >>>>> input values are the same as the output values. With
> >> > >>>>>
> >> > >>>>> cogroup
> >> > >>>>>
> >> > >>>>> ...
> >> > >
> >> > > --
> >> > > Signature
> >> > > <http://www.openbet.com/>     Michal Borowiecki
> >> > > Senior Software Engineer L4
> >> > >       T:      +44 208 742 1600 <+44%2020%208742%201600>
> >> > >
> >> > >
> >> > >       +44 203 249 8448 <+44%2020%203249%208448>
> >> > >
> >> > >
> >> > >
> >> > >       E:      [email protected]
> >> > >       W:      www.openbet.com <http://www.openbet.com/>
> >> > >
> >> > >
> >> > >       OpenBet Ltd
> >> > >
> >> > >       Chiswick Park Building 9
> >> > >
> >> > >       566 Chiswick High Rd
> >> > >
> >> > >       London
> >> > >
> >> > >       W4 5XT
> >> > >
> >> > >       UK
> >> > >
> >> > >
> >> > > <https://www.openbet.com/email_promo>
> >> > >
> >> > > This message is confidential and intended only for the addressee. If
> >> you
> >> > > have received this message in error, please immediately notify the
> >> > > [email protected] <mailto:[email protected]> and delete
> it
> >> > > from your system as well as any copies. The content of e-mails as
> well
> >> > > as traffic data may be monitored by OpenBet for employment and
> >> security
> >> > > purposes. To protect the environment please do not print this e-mail
> >> > > unless necessary. OpenBet Ltd. Registered Office: Chiswick Park
> >> Building
> >> > > 9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company
> >> > > registered in England and Wales. Registered no. 3134634. VAT no.
> >> > > GB927523612
> >> > >
> >> >
> >> >
> >>
> >
> >
> >
> > --
> > -- Guozhang
> >
>
>
>
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-150 - Kafka-Streams Cogroup

Reply via email to