Re: [VOTE] KIP-155 Add range scan for windowed state stores

Michal Borowiecki Thu, 11 May 2017 00:45:07 -0700

Also, wrt

In the case of the window store, the "key" of the single-key iterator is
the actual timestamp of the underlying entry, not just range of thewindow,
so if we were to wrap the result key a window we wouldn't be getting back
the equivalent of the single key iterator.

I believe the timestamp in the entry *is* the window start time (the endtime is implicitly known by adding the window size to the window start time)

https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KStreamWindowReduce.java#L109


https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KStreamWindowAggregate.java#L111

Both use window.start() as the timestamp when storing into the WindowStore.

Or am I confusing something horribly here? Hope not ;-)

If the above is correct, then using KeyValueIterator<Windowed<K>, V> asthe return type of the new fetch method would indeed not lose anythingthe single key iterator is offering.

The window end time could simply be calculated as window start time +window size (window size would have to be passed from the store supplierto the store implementation, which I think it isn't now but that's animplementation detail).

If you take objection to exposing the window end time because the singlekey iterator doesn't do that, then an alternative could also be to havethe return type of the new fetch be something likeKeyValueItarator<Tuple2<K, Long>, V>, since the key is composed of theactual key and the timestamp together. peakNextKey() would then allowyou to glimpse both the actual key and the associated window start time.This feels like a better workaround then putting the KeyValue pair inthe V of the WindowStoreIterator<V>.

All-in-all, I'd still prefer KeyValueIterator<Windowed<K>, V> as it moreclearly names what's what.


What do you think?

Thanks,

Michal

On 11/05/17 07:51, Michal Borowiecki wrote:

Well, another concern, apart from potential confusion, is that youwon't be able to peek the actual next key, just the timestamp. So thetradeoff is between having consistency in return types versusconsistency in having the ability to know the next key without movingthe iterator. To me the latter just feels more important.


Thanks,
Michal
On 11 May 2017 12:46 a.m., Xavier Léauté <xav...@confluent.io> wrote:

    Thank you for the feedback Michal.

    While I agree the return may be a little bit more confusing to reason
    about, the reason for doing so was to keep the range query interfaces
    consistent with their single-key counterparts.

    In the case of the window store, the "key" of the single-key
    iterator is
    the actual timestamp of the underlying entry, not just range of
    the window,
    so if we were to wrap the result key a window we wouldn't be
    getting back
    the equivalent of the single key iterator.

    In both cases peekNextKey is just returning the timestamp of the
    next entry
    in the window store that matches the query.

    In the case of the session store, we already return Windowed<K>
    for the
    single-key method, so it made sense there to also return
    Windowed<K> for
    the range method.

    Hope this make sense? Let me know if you still have concerns about
    this.

    Thank you,
    Xavier

    On Wed, May 10, 2017 at 12:25 PM Michal Borowiecki <
    michal.borowie...@openbet.com> wrote:

    > Apologies, I missed the discussion (or lack thereof) about the
    return
    > type of:
    >
    > WindowStoreIterator<KeyValue<K, V>> fetch(K from, K to, long
    timeFrom,
    > long timeTo)
    >
    >
    > WindowStoreIterator<V> (as the KIP mentions) is a subclass of
    > KeyValueIterator<Long, V>
    >
    > KeyValueIterator<K,V> has the following method:
    >
    > /** * Peek at the next key without advancing the iterator *
    @return the
    > key of the next value that would be returned from the next call
    to next
    > */ K peekNextKey();
    >
    > Given the type in this case will be Long, I assume what it would
    return
    > is the window timestamp of the next found record?
    >
    >
    > In the case of WindowStoreIterator<V> fetch(K key, long
    timeFrom, long
    > timeTo);
    > all records found by fetch have the same key, so it's harmless
    to return
    > the timestamp of the next found window but here we have varying
    keys and
    > varying windows, so won't it be too confusing?
    >
    > KeyValueIterator<Windowed<K>, V> (as in the proposed
    > ReadOnlySessionStore.fetch) just feels much more intuitive.
    >
    > Apologies again for jumping onto this only once the voting has
    already
    > begun.
    > Thanks,
    > Michał
    >
    > On 10/05/17 20:08, Sriram Subramanian wrote:
    > > +1
    > >
    > > On Wed, May 10, 2017 at 11:42 AM, Bill Bejeck
    <bbej...@gmail.com> wrote:
    > >
    > >> +1
    > >>
    > >> Thanks,
    > >> Bill
    > >>
    > >> On Wed, May 10, 2017 at 2:38 PM, Guozhang Wang
    <wangg...@gmail.com>
    > wrote:
    > >>
    > >>> +1. Thank you!
    > >>>
    > >>> On Wed, May 10, 2017 at 11:30 AM, Xavier Léauté
    <xav...@confluent.io>
    > >>> wrote:
    > >>>
    > >>>> Hi everyone,
    > >>>>
    > >>>> Since there aren't any objections to this addition, I would
    like to
    > >> start
    > >>>> the voting on KIP-155 so we can hopefully get this into 0.11.
    > >>>>
    > >>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP+
    > >>>> 155+-+Add+range+scan+for+windowed+state+stores
    > >>>>
    > >>>> Voting will stay active for at least 72 hours.
    > >>>>
    > >>>> Thank you,
    > >>>> Xavier
    > >>>>
    > >>>
    > >>>
    > >>> --
    > >>> -- Guozhang
    > >>>
    >
    >

Re: [VOTE] KIP-155 Add range scan for windowed state stores

Reply via email to