Re: [DISCUSS] KIP-617: Allow Kafka Streams State Stores to be iterated backwards

John Roesler Thu, 02 Jul 2020 06:36:07 -0700

Hi Jorge,

Thanks for the update. I think this is a good plan.


I just took a look at the KIP again, and it looks like there’s a revision error 
in which two methods are proposed for SessionStore, but seem like they should 
be in ReadOnlySessionStore. Do I read that right?

Otherwise, I’m happy with your proposal. 

Thanks,
John

On Wed, Jul 1, 2020, at 17:01, Jorge Esteban Quilcate Otoya wrote:
> Quick update: KIP is updated with latest changes now.
> Will leave it open this week while working on the PR.
> 
> Hope to open a new vote thread over the next few days if no additional
> feedback is provided.
> 
> Cheers,
> Jorge.
> 
> On Mon, Jun 29, 2020 at 11:30 AM Jorge Esteban Quilcate Otoya <
> [email protected]> wrote:
> 
> > Thanks, John!
> >
> > Make sense to reconsider the current approach. I was heading in a similar
> > direction while drafting the implementation. Metered, Caching, and other
> > layers will also have to get duplicated to build up new methods in `Stores`
> > factory, and class casting issues would appear on stores created by DSL.
> >
> > I will draft a proposal with new methods (move methods from proposed
> > interfaces to existing ones) with default implementation in a KIP update
> > and wait for Matthias to chime in to validate this approach.
> >
> > Jorge.
> >
> >
> > On Sat, Jun 27, 2020 at 4:01 PM John Roesler <[email protected]> wrote:
> >
> >> Hi Jorge,
> >>
> >> Sorry for my silence, I've been absorbed with the 2.6 and 2.5.1 releases.
> >>
> >> The idea to separate the new methods into "mixin" interfaces seems
> >> like a good one, but as we've discovered in KIP-614, it doesn't work
> >> out that way in practice. The problem is that the store implementations
> >> are just the base layer that get composed with other layers in Streams
> >> before they can be accessed in the DSL. This is extremely subtle, so
> >> I'm going to put everyone to sleep with a detailed explanation:
> >>
> >> For example, this is the mechanism by which all KeyValueStore
> >> implementations get added to Streams:
> >> org.apache.kafka.streams.state.internals.KeyValueStoreBuilder#build
> >> return new MeteredKeyValueStore<>(
> >>   maybeWrapCaching(maybeWrapLogging(storeSupplier.get())),
> >>   storeSupplier.metricsScope(),
> >>   time,
> >>   keySerde,
> >>   valueSerde
> >> );
> >>
> >> In the DSL, the store that a processor gets from the context would be
> >> the result of this composition. So even if the storeSupplier.get() returns
> >> a store that implements the "reverse" interface, when you try to use it
> >> from a processor like:
> >> org.apache.kafka.streams.kstream.ValueTransformerWithKey#init
> >> ReadOnlyBackwardWindowStore<K, V> store =
> >>   (ReadOnlyBackwardWindowStore<K, V>) context.getStateStore(..)
> >>
> >> You'd just get a ClassCastException because it's actually a
> >> MeteredKeyValueStore, which doesn't implement
> >> ReadOnlyBackwardWindowStore.
> >>
> >> The only way to make this work would be to make the Metered,
> >> Caching, and Logging layers also implement the new interfaces,
> >> but this effectively forces implementations to also implement
> >> the interface. Otherwise, the intermediate layers would have to
> >> cast the store in each method, like this:
> >> MeteredWindowStore#backwardFetch {
> >>   ((ReadOnlyBackwardWindowStore<K, V>) innerStore).backwardFetch(..)
> >> }
> >>
> >> And then if the implementation doesn't "opt in" by implementing
> >> the interface, you'd get a ClassCastException, not when you get the
> >> store, but when you try to use it.
> >>
> >> The fact that we get ClassCastExceptions no matter which way we
> >> turn here indicates that we're really not getting any benefit from the
> >> type system, which makes the extra interfaces seem not worth all the
> >> code involved.
> >>
> >> Where we landed in KIP-614 is that, unless we want to completely
> >> revamp the way that StateStores work in the DSL, you might as
> >> well just add the new methods to the existing interfaces. To prevent
> >> compilation errors, we can add default implementations that throw
> >> UnsupportedOperationException. If a store doesn't opt in by
> >> implementing the methods, you'd get an UnsupportedOperationException,
> >> which seems no worse, and maybe better, than the ClassCastException
> >> you'd get if we go with the "mixin interface" approach.
> >>
> >> A quick note: This entire discussion focuses on the DSL. If you're just
> >> using the Processor API by directly adding the a custom store to the
> >> Topology:
> >> org.apache.kafka.streams.Topology#addStateStore
> >> and then retrieving it in the processor via:
> >> org.apache.kafka.streams.processor.ProcessorContext#getStateStore
> >> in
> >> org.apache.kafka.streams.processor.Processor#init
> >>
> >> Then, you can both register and retrieve _any_ StateStore implementation.
> >> There's no need to use KeyValueStore or any other built-in interface.
> >> In other words, KeyValueStore and company are only part of the DSL,
> >> not the PAPI. So, discussions about the build-in store interfaces are only
> >> really relevant in the context of the DSL, Transformers, and Materialized.
> >>
> >> So, in conclusion, I'd really recommend just adding any new methods to
> >> the existing store interfaces. We might be able to revamp the API in the
> >> future to support mixins, but it's a much larger scope of work than this
> >> KIP.
> >> A more minor comment is that we don't need to add Deprecated variants
> >> of new methods.
> >>
> >> Thanks again, and once again, I'm sorry I tuned out and didn't offer this
> >> feedback before you revised the KIP.
> >> -John
> >>
> >>
> >>
> >>
> >> On Mon, Jun 22, 2020, at 06:11, Jorge Esteban Quilcate Otoya wrote:
> >> > Hi everyone,
> >> >
> >> > I've updated the KIP, applying Matthias' feedback regarding interface
> >> > hierarchy.
> >> >
> >> > Also, following the last email, I think we can consider reverse
> >> operations
> >> > on KeyValue range as well, as implementation supports lexicographic
> >> order.
> >> >
> >> > I considered different naming between Key-based ranges and Time-based
> >> > ranges, and mitigate confusion when fetching keys and time ranges as
> >> > WindowStore does:
> >> >
> >> > Key-based ranges: reverseRange(), reverseAll()
> >> > Time-based ranges: backwardFetch()
> >> >
> >> > Then, key-based changes apply to KeyValueStore, and time-based changes
> >> to
> >> > Window and Session stores.
> >> >
> >> > Let me know if you have any questions.
> >> >
> >> > Thanks,
> >> > Jorge.
> >> >
> >> >
> >> > On Tue, Jun 16, 2020 at 12:47 AM Jorge Esteban Quilcate Otoya <
> >> > [email protected]> wrote:
> >> >
> >> > > Hi everyone, sorry for the late reply.
> >> > >
> >> > > Thanks Matthias for your feedback. I think it makes sense to
> >> reconsider
> >> > > the current design based on your input.
> >> > >
> >> > > After digging deeper into the current implementation, I'd like to
> >> bring my
> >> > > current understanding to be double-checked as it might be redefining
> >> the
> >> > > KIP's scope:
> >> > >
> >> > > 1. There are 2 ranges been exposed by different stores:
> >> > >
> >> > > a. Key Range
> >> > > b. Timestamp Range
> >> > >
> >> > > So far, we have discussed covering both.
> >> > >
> >> > > 2. Key Range functions do not provide ordering guarantees by design:
> >> > >
> >> > > ```ReadOnlyKeyValueStore.java
> >> > >     /**
> >> > >      * Get an iterator over a given range of keys. This iterator must
> >> be
> >> > > closed after use.
> >> > >      * The returned iterator must be safe from {@link
> >> > > java.util.ConcurrentModificationException}s
> >> > >      * and must not return null values. No ordering guarantees are
> >> > > provided.
> >> > >      * ...
> >> > >      */
> >> > >      KeyValueIterator<K, V> range(K from, K to);
> >> > > ```
> >> > >
> >> > > Therefore, I'd propose removing Key range operations from the scope.
> >> > >
> >> > > 3. Timestamp Range operations happen at the SegmentsStore level
> >> (internal)
> >> > > API
> >> > >
> >> > > AFAICT, Segments wrappers handle all Timestamp ranges queries.
> >> > >
> >> > > I'd propose extending `Segments#segments(timeFrom, timeTo, backwards)`
> >> > > with a flag for backwards operations.
> >> > >
> >> > > As segments returned will be processed backwards, I'm not extending
> >> > > KeyValueStores to query each segment backwards as previous point 2.
> >> > >
> >> > > 4. Extend WindowStores implementations with a new
> >> > > WindowBackwardStore/ReadOnlyBackwardStore:
> >> > >
> >> > > ```java
> >> > > public interface ReadOnlyWindowBackwardStore<K, V> {
> >> > >     WindowStoreIterator<V> backwardFetch(K key, Instant from, Instant
> >> to)
> >> > > throws IllegalArgumentException;
> >> > >
> >> > >     KeyValueIterator<Windowed<K>, V> backwardFetch(K from, K to,
> >> Instant
> >> > > fromTime, Instant toTime)
> >> > >         throws IllegalArgumentException;
> >> > >
> >> > >     KeyValueIterator<Windowed<K>, V> backwardFetchAll(Instant from,
> >> > > Instant to) throws IllegalArgumentException;
> >> > > ```
> >> > >
> >> > > 5. SessionStore is a bit different as it has fetch/find sessions
> >> spread
> >> > > between SessionStore and ReadOnlySessionStore.
> >> > >
> >> > > I'd propose a new interface `SessionBackwardStore` to expose backward
> >> find
> >> > > operations:
> >> > >
> >> > > ```java
> >> > > public interface SessionBackwardStore<K, AGG> {
> >> > >     KeyValueIterator<Windowed<K>, AGG> backwardFindSessions(final K
> >> key,
> >> > > final long earliestSessionEndTime, final long latestSessionStartTime);
> >> > >
> >> > >     KeyValueIterator<Windowed<K>, AGG> backwardFindSessions(final K
> >> > > keyFrom, final K keyTo, final long earliestSessionEndTime, final long
> >> > > latestSessionStartTime);
> >> > > }
> >> > > ```
> >> > >
> >> > > If this understanding is correct I'll proceed to update the KIP based
> >> on
> >> > > this.
> >> > >
> >> > > Looking forward to your feedback.
> >> > >
> >> > > Thanks,
> >> > > Jorge.
> >> > >
> >> > > On Fri, May 29, 2020 at 3:32 AM Matthias J. Sax <[email protected]>
> >> wrote:
> >> > >
> >> > >> Hey,
> >> > >>
> >> > >> Sorry that I am late to the game. I am not 100% convinced about the
> >> > >> current proposal. Using a new config as feature flag seems to be
> >> rather
> >> > >> "nasty" to me, and flipping from/to is a little bit too fancy for my
> >> > >> personal taste.
> >> > >>
> >> > >> I agree, that the original proposal using a "ReadDirection" enum is
> >> not
> >> > >> ideal either.
> >> > >>
> >> > >> Thus, I would like to put out a new idea: We could add a new
> >> interface
> >> > >> that offers new methods that return revers iterators.
> >> > >>
> >> > >> The KIP already proposes to add `reverseAll()` and it seems backward
> >> > >> incompatible to just add this method to `ReadOnlyKeyValueStore` and
> >> > >> `ReadOnlyWindowStore`. I don't think we could provide a useful
> >> default
> >> > >> implementation for custom stores and thus either break compatibility
> >> or
> >> > >> need add a default that just throws an exception. Neither seems to
> >> be a
> >> > >> good option.
> >> > >>
> >> > >> Using a new interface avoid this issue and allows users implementing
> >> > >> custom stores to opt-in by adding the interface to their stores.
> >> > >> Furthermore, we don't need any config. In the end, we encapsulte the
> >> > >> change into the store, and our runtime is agnostic to it (as it
> >> should
> >> > >> be).
> >> > >>
> >> > >> The hierarchy becomes a little complex (but uses would not really see
> >> > >> the complexity):
> >> > >>
> >> > >> // exsiting
> >> > >> ReadOnlyKeyValueStore
> >> > >> KeyValueStore extend StateStore, ReadOnlyKeyValueStore
> >> > >>
> >> > >>
> >> > >> // helper interface; users don't care
> >> > >> // need similar ones for other stores
> >> > >> ReverseReadOnlyKeyValueStore {
> >> > >>     KeyValueIterator<K, V> reverseRange(K from, K to);
> >> > >>     KeyValueIterator<K, V> reverseAll();
> >> > >> }
> >> > >>
> >> > >>
> >> > >> // two new user facing interfaces for kv-store
> >> > >> // need similar ones for other stores
> >> > >> ReadOnlyKeyValueStoreWithReverseIterators extends
> >> ReadOnlyKeyValueStore,
> >> > >> ReverseReadOnlyKeyValueStore
> >> > >>
> >> > >> KeyValueStoreWithReverseIterators extends KeyValueStore,
> >> > >> ReverseReadOnlyKeyValueStore
> >> > >>
> >> > >>
> >> > >> // updated (also internal)
> >> > >> // also need to update other built-in stores
> >> > >> RocksDB implements KeyValueStoreWithReverseIterators,
> >> BulkLoadingStore
> >> > >>
> >> > >>
> >> > >> In the end, user would only care about the two (for kv-case) new
> >> > >> interface that offer revers iterator (read only and regular) and can
> >> > >> cast stores accordingly in their Processors/Transformers or via IQ.
> >> > >>
> >> > >>
> >> > >> Btw: if we add revers iterator for KeyValue and Window store, should
> >> we
> >> > >> do the same for Session store?
> >> > >>
> >> > >>
> >> > >>
> >> > >> This might be more code to write, but I believe it provides the
> >> better
> >> > >> user experience. Thoughts?
> >> > >>
> >> > >>
> >> > >>
> >> > >> -Matthias
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> On 5/26/20 8:47 PM, John Roesler wrote:
> >> > >> > Sorry for my silence, Jorge,
> >> > >> >
> >> > >> > I've just taken another look, and I'm personally happy with the
> >> KIP.
> >> > >> >
> >> > >> > Thanks,
> >> > >> > -John
> >> > >> >
> >> > >> > On Tue, May 26, 2020, at 16:17, Jorge Esteban Quilcate Otoya wrote:
> >> > >> >> If no additional comments, I will proceed to start the a vote
> >> thread.
> >> > >> >>
> >> > >> >> Thanks a lot for your feedback!
> >> > >> >>
> >> > >> >> On Fri, May 22, 2020 at 9:25 AM Jorge Esteban Quilcate Otoya <
> >> > >> >> [email protected]> wrote:
> >> > >> >>
> >> > >> >>> Thanks Sophie. I like the `reverseAll()` idea.
> >> > >> >>>
> >> > >> >>> I updated the KIP with your feedback.
> >> > >> >>>
> >> > >> >>>
> >> > >> >>>
> >> > >> >>> On Fri, May 22, 2020 at 4:22 AM Sophie Blee-Goldman <
> >> > >> [email protected]>
> >> > >> >>> wrote:
> >> > >> >>>
> >> > >> >>>> Hm, the case of `all()` does seem to present a dilemma in the
> >> case of
> >> > >> >>>> variable-length keys.
> >> > >> >>>>
> >> > >> >>>> In the case of fixed-length keys, you can just compute the keys
> >> that
> >> > >> >>>> correspond
> >> > >> >>>> to the maximum and minimum serialized bytes, then perform a
> >> `range()`
> >> > >> >>>> query
> >> > >> >>>> instead of an `all()`. If your keys don't have a well-defined
> >> > >> ordering
> >> > >> >>>> such
> >> > >> >>>> that
> >> > >> >>>> you can't determine the MAX_KEY, then you probably don't care
> >> about
> >> > >> the
> >> > >> >>>> iterator order anyway.
> >> > >> >>>>
> >> > >> >>>>  But with variable-length keys, there is no MAX_KEY. If all your
> >> > >> keys were
> >> > >> >>>> just
> >> > >> >>>> of the form 'a', 'aa', 'aaaaa', 'aaaaaaa' then in fact the only
> >> way
> >> > >> to
> >> > >> >>>> figure out the
> >> > >> >>>> maximum key in the store is by using `all()` -- and without a
> >> reverse
> >> > >> >>>> iterator, you're
> >> > >> >>>> doomed to iterate through every single key just to answer that
> >> simple
> >> > >> >>>> question.
> >> > >> >>>>
> >> > >> >>>> That said, I still think determining the iterator order based
> >> on the
> >> > >> >>>> to/from bytes
> >> > >> >>>> makes a lot of intuitive sense and gives the API a nice
> >> symmetry.
> >> > >> What if
> >> > >> >>>> we
> >> > >> >>>> solved the `all()` problem by just giving `all()` a reverse
> >> form to
> >> > >> >>>> complement it?
> >> > >> >>>> Ie we would have `all()` and `reverseAll()`, or something to
> >> that
> >> > >> effect.
> >> > >> >>>>
> >> > >> >>>> On Thu, May 21, 2020 at 3:41 PM Jorge Esteban Quilcate Otoya <
> >> > >> >>>> [email protected]> wrote:
> >> > >> >>>>
> >> > >> >>>>> Thanks John.
> >> > >> >>>>>
> >> > >> >>>>> Agree. I like the first approach as well, with StreamsConfig
> >> flag
> >> > >> >>>> passing
> >> > >> >>>>> by via ProcessorContext.
> >> > >> >>>>>
> >> > >> >>>>> Another positive effect with "reverse parameters" is that in
> >> the
> >> > >> case of
> >> > >> >>>>> `fetch(keyFrom, keyTo, timeFrom, timeTo)` users can decide
> >> _which_
> >> > >> pair
> >> > >> >>>> to
> >> > >> >>>>> flip, whether with `ReadDirection` enum it apply to both.
> >> > >> >>>>>
> >> > >> >>>>> The only issue I've found while reviewing the KIP is that
> >> `all()`
> >> > >> won't
> >> > >> >>>> fit
> >> > >> >>>>> within this approach.
> >> > >> >>>>>
> >> > >> >>>>> We could remove it from the KIP and argue that for WindowStore,
> >> > >> >>>>> `fetchAll(0, Long.MAX_VALUE)` can be used to get all in reverse
> >> > >> order,
> >> > >> >>>> and
> >> > >> >>>>> for KeyValueStore, no ordering guarantees are provided.
> >> > >> >>>>>
> >> > >> >>>>> If there is consensus with this changes, I will go and update
> >> the
> >> > >> KIP.
> >> > >> >>>>>
> >> > >> >>>>> On Thu, May 21, 2020 at 3:33 PM John Roesler <
> >> [email protected]>
> >> > >> >>>> wrote:
> >> > >> >>>>>
> >> > >> >>>>>> Hi Jorge,
> >> > >> >>>>>>
> >> > >> >>>>>> Thanks for that idea. I agree, a feature flag would protect
> >> anyone
> >> > >> >>>>>> who may be depending on the current behavior.
> >> > >> >>>>>>
> >> > >> >>>>>> It seems better to locate the feature flag in the
> >> initialization
> >> > >> >>>> logic of
> >> > >> >>>>>> the store, rather than have a method on the "live" store that
> >> > >> changes
> >> > >> >>>>>> its behavior on the fly.
> >> > >> >>>>>>
> >> > >> >>>>>> It seems like there are two options here, one is to add a new
> >> > >> config:
> >> > >> >>>>>>
> >> > >> >>>>>> StreamsConfig.ENABLE_BACKWARDS_ITERATION =
> >> > >> >>>>>>   "enable.backwards.iteration
> >> > >> >>>>>>
> >> > >> >>>>>> Or we can add a feature flag in Materialized, like
> >> > >> >>>>>>
> >> > >> >>>>>> Materialized.enableBackwardsIteration()
> >> > >> >>>>>>
> >> > >> >>>>>> I think I'd personally lean toward the config, for the
> >> following
> >> > >> >>>> reason.
> >> > >> >>>>>> The concern that Sophie raised is that someone's program may
> >> depend
> >> > >> >>>>>> on the existing contract of getting an empty iterator. We
> >> don't
> >> > >> want
> >> > >> >>>> to
> >> > >> >>>>>> switch behavior when they aren't expecting it, so we provide
> >> them a
> >> > >> >>>>>> config to assert that they _are_ expecting the new behavior,
> >> which
> >> > >> >>>>>> means they take responsibility for updating their code to
> >> expect
> >> > >> the
> >> > >> >>>> new
> >> > >> >>>>>> behavior.
> >> > >> >>>>>>
> >> > >> >>>>>> There doesn't seem to be a reason to offer a choice of
> >> behaviors
> >> > >> on a
> >> > >> >>>>>> per-query, or per-store basis. We just want people to be not
> >> > >> surprised
> >> > >> >>>>>> by this change in general.
> >> > >> >>>>>>
> >> > >> >>>>>> What do you think?
> >> > >> >>>>>> Thanks,
> >> > >> >>>>>> -John
> >> > >> >>>>>>
> >> > >> >>>>>> On Wed, May 20, 2020, at 17:37, Jorge Quilcate wrote:
> >> > >> >>>>>>> Thank you both for the great feedback.
> >> > >> >>>>>>>
> >> > >> >>>>>>> I like the "fancy" proposal :), and how it removes the need
> >> for
> >> > >> >>>>>>> additional API methods. And with a feature flag on
> >> `StateStore`,
> >> > >> >>>>>>> disabled by default, should no break current users.
> >> > >> >>>>>>>
> >> > >> >>>>>>> The only side-effect I can think of is that: by moving the
> >> flag
> >> > >> >>>>> upwards,
> >> > >> >>>>>>> all later operations become affected; which might be ok for
> >> most
> >> > >> >>>> (all?)
> >> > >> >>>>>>> cases. I can't think of an scenario where this would be an
> >> issue,
> >> > >> >>>> just
> >> > >> >>>>>>> want to point this out.
> >> > >> >>>>>>>
> >> > >> >>>>>>> If moving to this approach, I'd like to check if I got this
> >> right
> >> > >> >>>>> before
> >> > >> >>>>>>> updating the KIP:
> >> > >> >>>>>>>
> >> > >> >>>>>>> - only `StateStore` will change by having a new method:
> >> > >> >>>>>>> `backwardIteration()`, `false` by default to keep things
> >> > >> compatible.
> >> > >> >>>>>>> - then all `*Stores` will have to update their implementation
> >> > >> based
> >> > >> >>>> on
> >> > >> >>>>>>> this flag.
> >> > >> >>>>>>>
> >> > >> >>>>>>>
> >> > >> >>>>>>> On 20/05/2020 21:02, Sophie Blee-Goldman wrote:
> >> > >> >>>>>>>>> There's no possibility that someone could be relying
> >> > >> >>>>>>>>> on iterating over that range in increasing order, because
> >> that's
> >> > >> >>>> not
> >> > >> >>>>>> what
> >> > >> >>>>>>>>> happens. However, they could indeed be relying on getting
> >> an
> >> > >> >>>> empty
> >> > >> >>>>>>>> iterator
> >> > >> >>>>>>>>
> >> > >> >>>>>>>> I just meant that they might be relying on the assumption
> >> that
> >> > >> the
> >> > >> >>>>>> range
> >> > >> >>>>>>>> query
> >> > >> >>>>>>>> will never return results with decreasing keys. The empty
> >> > >> iterator
> >> > >> >>>>>> wouldn't
> >> > >> >>>>>>>> break that contract, but of course a surprise reverse
> >> iterator
> >> > >> >>>> would.
> >> > >> >>>>>>>>
> >> > >> >>>>>>>> FWIW I actually am in favor of automatically converting to a
> >> > >> >>>> reverse
> >> > >> >>>>>>>> iterator,
> >> > >> >>>>>>>> I just thought we should consider whether this should be
> >> off by
> >> > >> >>>>>> default or
> >> > >> >>>>>>>> even possible to disable at all.
> >> > >> >>>>>>>>
> >> > >> >>>>>>>> On Tue, May 19, 2020 at 7:42 PM John Roesler <
> >> > >> [email protected]
> >> > >> >>>>>
> >> > >> >>>>>> wrote:
> >> > >> >>>>>>>>
> >> > >> >>>>>>>>> Thanks for the response, Sophie,
> >> > >> >>>>>>>>>
> >> > >> >>>>>>>>> I wholeheartedly agree we should take as much into account
> >> as
> >> > >> >>>>> possible
> >> > >> >>>>>>>>> up front, rather than regretting our decisions later. I
> >> actually
> >> > >> >>>> do
> >> > >> >>>>>> share
> >> > >> >>>>>>>>> your vague sense of worry, which was what led me to say
> >> > >> initially
> >> > >> >>>>>> that I
> >> > >> >>>>>>>>> thought my counterproposal might be "too fancy".
> >> Sometimes, it's
> >> > >> >>>>>> better
> >> > >> >>>>>>>>> to be explicit instead of "elegant", if we think more
> >> people
> >> > >> >>>> will be
> >> > >> >>>>>>>>> confused
> >> > >> >>>>>>>>> than not.
> >> > >> >>>>>>>>>
> >> > >> >>>>>>>>> I really don't think that there's any danger of "relying
> >> on a
> >> > >> >>>> bug"
> >> > >> >>>>>> here,
> >> > >> >>>>>>>>> although
> >> > >> >>>>>>>>> people certainly could be relying on current behavior. One
> >> thing
> >> > >> >>>> to
> >> > >> >>>>> be
> >> > >> >>>>>>>>> clear
> >> > >> >>>>>>>>> about (which I just left a more detailed comment in
> >> KAFKA-8159
> >> > >> >>>>> about)
> >> > >> >>>>>> is
> >> > >> >>>>>>>>> that
> >> > >> >>>>>>>>> when we say something like key1 > key2, this ordering is
> >> defined
> >> > >> >>>> by
> >> > >> >>>>>> the
> >> > >> >>>>>>>>> serde's output and nothing else.
> >> > >> >>>>>>>>>
> >> > >> >>>>>>>>> Currently, thanks to your fix in
> >> > >> >>>>>> https://github.com/apache/kafka/pull/6521
> >> > >> >>>>>>>>> ,
> >> > >> >>>>>>>>> the store contract is that for range scans, if from > to,
> >> then
> >> > >> >>>> the
> >> > >> >>>>>> store
> >> > >> >>>>>>>>> must
> >> > >> >>>>>>>>> return an empty iterator. There's no possibility that
> >> someone
> >> > >> >>>> could
> >> > >> >>>>> be
> >> > >> >>>>>>>>> relying
> >> > >> >>>>>>>>> on iterating over that range in increasing order, because
> >> that's
> >> > >> >>>> not
> >> > >> >>>>>> what
> >> > >> >>>>>>>>> happens. However, they could indeed be relying on getting
> >> an
> >> > >> >>>> empty
> >> > >> >>>>>>>>> iterator.
> >> > >> >>>>>>>>>
> >> > >> >>>>>>>>> My counterproposal was to actually change this contract to
> >> say
> >> > >> >>>> that
> >> > >> >>>>>> the
> >> > >> >>>>>>>>> store
> >> > >> >>>>>>>>> must return an iterator over the keys in that range, but
> >> in the
> >> > >> >>>>>> reverse
> >> > >> >>>>>>>>> order.
> >> > >> >>>>>>>>> So, in addition to considering whether this idea is "too
> >> fancy"
> >> > >> >>>> (aka
> >> > >> >>>>>>>>> confusing),
> >> > >> >>>>>>>>> we should also consider the likelihood of breaking an
> >> existing
> >> > >> >>>>>> program with
> >> > >> >>>>>>>>> this behavior/contract change.
> >> > >> >>>>>>>>>
> >> > >> >>>>>>>>> To echo your clarification, I'm also not advocating
> >> strongly in
> >> > >> >>>>> favor
> >> > >> >>>>>> of my
> >> > >> >>>>>>>>> proposal. I just wanted to present it for consideration
> >> > >> alongside
> >> > >> >>>>>> Jorge's
> >> > >> >>>>>>>>> original one.
> >> > >> >>>>>>>>>
> >> > >> >>>>>>>>> Thanks for raising these very good points,
> >> > >> >>>>>>>>> -John
> >> > >> >>>>>>>>>
> >> > >> >>>>>>>>> On Tue, May 19, 2020, at 20:49, Sophie Blee-Goldman wrote:
> >> > >> >>>>>>>>>>> Rather than working around it, I think we should just
> >> fix it
> >> > >> >>>>>>>>>> Now *that's* a "fancy" idea :P
> >> > >> >>>>>>>>>>
> >> > >> >>>>>>>>>> That was my primary concern, although I do have a vague
> >> sense
> >> > >> of
> >> > >> >>>>>> worry
> >> > >> >>>>>>>>>> that we might be allowing users to get into trouble
> >> without
> >> > >> >>>>>> realizing it.
> >> > >> >>>>>>>>>> For example if their custom serdes suffer a similar bug
> >> as the
> >> > >> >>>>> above,
> >> > >> >>>>>>>>>> and/or
> >> > >> >>>>>>>>>> they rely on getting results in increasing order (of the
> >> keys)
> >> > >> >>>> even
> >> > >> >>>>>> when
> >> > >> >>>>>>>>>> to < from. Maybe they're relying on the fact that the
> >> range
> >> > >> >>>> query
> >> > >> >>>>>> returns
> >> > >> >>>>>>>>>> nothing in that case.
> >> > >> >>>>>>>>>>
> >> > >> >>>>>>>>>> Not sure if that qualifies as relying on a bug or not,
> >> but in
> >> > >> >>>> that
> >> > >> >>>>>> past
> >> > >> >>>>>>>>>> we've
> >> > >> >>>>>>>>>> taken the stance that we should not break compatibility
> >> even if
> >> > >> >>>> the
> >> > >> >>>>>> user
> >> > >> >>>>>>>>>> was relying on bugs or unintentional behavior.
> >> > >> >>>>>>>>>>
> >> > >> >>>>>>>>>> Just to clarify I'm not advocating strongly against this
> >> > >> >>>> proposal,
> >> > >> >>>>>> just
> >> > >> >>>>>>>>>> laying
> >> > >> >>>>>>>>>> out some considerations we should take into account. At
> >> the end
> >> > >> >>>> of
> >> > >> >>>>>> the
> >> > >> >>>>>>>>> day
> >> > >> >>>>>>>>>> we should do what's right rather than maintain
> >> compatibility
> >> > >> >>>> with
> >> > >> >>>>>>>>> existing
> >> > >> >>>>>>>>>> bugs, but sometimes there's a reasonable middle ground.
> >> > >> >>>>>>>>>>
> >> > >> >>>>>>>>>> On Tue, May 19, 2020 at 6:15 PM John Roesler <
> >> > >> >>>> [email protected]>
> >> > >> >>>>>>>>> wrote:
> >> > >> >>>>>>>>>>> Thanks Sophie,
> >> > >> >>>>>>>>>>>
> >> > >> >>>>>>>>>>> Woah, that’s a nasty bug. Rather than working around it,
> >> I
> >> > >> >>>> think
> >> > >> >>>>> we
> >> > >> >>>>>>>>> should
> >> > >> >>>>>>>>>>> just fix it. I’ll leave some comments on the Jira.
> >> > >> >>>>>>>>>>>
> >> > >> >>>>>>>>>>> It doesn’t seem like it should be this KIP’s concern
> >> that some
> >> > >> >>>>>> serdes
> >> > >> >>>>>>>>>>> might be incorrectly written.
> >> > >> >>>>>>>>>>>
> >> > >> >>>>>>>>>>> Were there other practical concerns that you had in mind?
> >> > >> >>>>>>>>>>>
> >> > >> >>>>>>>>>>> Thanks,
> >> > >> >>>>>>>>>>> John
> >> > >> >>>>>>>>>>>
> >> > >> >>>>>>>>>>> On Tue, May 19, 2020, at 19:10, Sophie Blee-Goldman
> >> wrote:
> >> > >> >>>>>>>>>>>> I like this "fancy idea" to just flip the to/from bytes
> >> but I
> >> > >> >>>>> think
> >> > >> >>>>>>>>> there
> >> > >> >>>>>>>>>>>> are some practical limitations to implementing this. In
> >> > >> >>>>> particular
> >> > >> >>>>>>>>>>>> I'm thinking about this issue
> >> > >> >>>>>>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-8159>
> >> with the
> >> > >> >>>>>> built-in
> >> > >> >>>>>>>>>>> signed
> >> > >> >>>>>>>>>>>> number serdes.
> >> > >> >>>>>>>>>>>>
> >> > >> >>>>>>>>>>>> This trick would actually fix the problem for
> >> > >> >>>> negative-negative
> >> > >> >>>>>>>>> queries
> >> > >> >>>>>>>>>>>> (ie where to & from are negative) but would cause
> >> > >> undetectable
> >> > >> >>>>>>>>>>>> incorrect results for negative-positive queries. For
> >> example,
> >> > >> >>>> say
> >> > >> >>>>>> you
> >> > >> >>>>>>>>>>>> call #range with from = -1 and to = 1, using the Short
> >> > >> serdes.
> >> > >> >>>>> The
> >> > >> >>>>>>>>>>>> serialized bytes for that are
> >> > >> >>>>>>>>>>>>
> >> > >> >>>>>>>>>>>> from = 1111111111111111
> >> > >> >>>>>>>>>>>> to = 0000000000000001
> >> > >> >>>>>>>>>>>>
> >> > >> >>>>>>>>>>>> so we would end up flipping those and iterating over
> >> all keys
> >> > >> >>>>> from
> >> > >> >>>>>>>>>>>> 0000000000000001 to 1111111111111111. Iterating in
> >> > >> >>>>> lexicographical
> >> > >> >>>>>>>>>>>> order means we would iterate over every key in the space
> >> > >> >>>> *except*
> >> > >> >>>>>> for
> >> > >> >>>>>>>>>>>> 0, but 0 is actually the *only* other key we meant to be
> >> > >> >>>> included
> >> > >> >>>>>> in
> >> > >> >>>>>>>>> the
> >> > >> >>>>>>>>>>>> range query.
> >> > >> >>>>>>>>>>>>
> >> > >> >>>>>>>>>>>> Currently we just log a warning and return an empty
> >> iterator
> >> > >> >>>> when
> >> > >> >>>>>>>>>>>> to < from, which is obviously also incorrect but feels
> >> > >> >>>> slightly
> >> > >> >>>>>> more
> >> > >> >>>>>>>>>>>> palatable. If we start automatically converting to
> >> reverse
> >> > >> >>>>> queries
> >> > >> >>>>>> we
> >> > >> >>>>>>>>>>>> can't even log a warning in this case unless we wanted
> >> to log
> >> > >> >>>> a
> >> > >> >>>>>>>>> warning
> >> > >> >>>>>>>>>>>> every time, which would be weird to do for a valid
> >> usage of a
> >> > >> >>>> new
> >> > >> >>>>>>>>>>>> feature.
> >> > >> >>>>>>>>>>>>
> >> > >> >>>>>>>>>>>> All that said, I still like the idea overall. Off the
> >> top of
> >> > >> >>>> my
> >> > >> >>>>>> head
> >> > >> >>>>>>>>> I
> >> > >> >>>>>>>>>>> guess
> >> > >> >>>>>>>>>>>> we could add a store config to enable/disable automatic
> >> > >> >>>> reverse
> >> > >> >>>>>>>>>>> iteration,
> >> > >> >>>>>>>>>>>> which is off by default?
> >> > >> >>>>>>>>>>>>
> >> > >> >>>>>>>>>>>> Thanks for the KIP! This will be a nice addition
> >> > >> >>>>>>>>>>>>
> >> > >> >>>>>>>>>>>> Sophie
> >> > >> >>>>>>>>>>>>
> >> > >> >>>>>>>>>>>>
> >> > >> >>>>>>>>>>>> On Tue, May 19, 2020 at 3:21 PM John Roesler <
> >> > >> >>>>> [email protected]>
> >> > >> >>>>>>>>>>> wrote:
> >> > >> >>>>>>>>>>>>> Hi there Jorge,
> >> > >> >>>>>>>>>>>>>
> >> > >> >>>>>>>>>>>>> Thanks for the KIP!
> >> > >> >>>>>>>>>>>>>
> >> > >> >>>>>>>>>>>>> I think this feature sounds very reasonable.
> >> > >> >>>>>>>>>>>>>
> >> > >> >>>>>>>>>>>>> I'm not 100% sure if this is "too fancy", but what do
> >> you
> >> > >> >>>> think
> >> > >> >>>>>>>>>>>>> about avoiding the enum by instead allowing people to
> >> flip
> >> > >> >>>>>>>>>>>>> the "from" and "to" endpoints? I.e., reading from "A"
> >> to "Z"
> >> > >> >>>>> would
> >> > >> >>>>>>>>>>>>> be a forward scan, and from "Z" to "A" would be a
> >> backward
> >> > >> >>>> one?
> >> > >> >>>>>>>>>>>>>
> >> > >> >>>>>>>>>>>>> Thanks,
> >> > >> >>>>>>>>>>>>> -John
> >> > >> >>>>>>>>>>>>>
> >> > >> >>>>>>>>>>>>> On Tue, May 19, 2020, at 16:20, Jorge Quilcate wrote:
> >> > >> >>>>>>>>>>>>>> Hi everyone,
> >> > >> >>>>>>>>>>>>>>
> >> > >> >>>>>>>>>>>>>> I would like to start the discussion for KIP-617:
> >> > >> >>>>>>>>>>>>>>
> >> > >> >>>>>>>>>
> >> > >> >>>>>>
> >> > >> >>>>>
> >> > >> >>>>
> >> > >>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-617%3A+Allow+Kafka+Streams+State+Stores+to+be+iterated+backwards
> >> > >> >>>>>>>>>>>>>> Looking forward to your feedback.
> >> > >> >>>>>>>>>>>>>>
> >> > >> >>>>>>>>>>>>>> Thanks!
> >> > >> >>>>>>>>>>>>>> Jorge.
> >> > >> >>>>>>>>>>>>>>
> >> > >> >>>>>>>>>>>>>>
> >> > >> >>>>>>>>>>>>>> Attachments:
> >> > >> >>>>>>>>>>>>>> * 0x5F2C6E22064982DF.asc
> >> > >> >>>>>>>
> >> > >> >>>>>>>
> >> > >> >>>>>>> Attachments:
> >> > >> >>>>>>> * 0x5F2C6E22064982DF.asc
> >> > >> >>>>>>
> >> > >> >>>>>
> >> > >> >>>>
> >> > >> >>>
> >> > >> >>
> >> > >>
> >> > >>
> >> >
> >>
> >
>

Re: [DISCUSS] KIP-617: Allow Kafka Streams State Stores to be iterated backwards

Reply via email to