Thanks John. > it looks like there’s a revision error in which two methods are proposed for SessionStore, but seem like they should be in ReadOnlySessionStore. Do I read that right?
Yes, I've opted to keep the new methods alongside the existing ones. In the case of SessionStore, `findSessions` are in `SessionStore`, and `fetch` are in `ReadOnlySessionStore`. If it makes more sense, I can move all of them to ReadOnlySessionStore. Let me know what you think. Thanks, Jorge. On Thu, Jul 2, 2020 at 2:36 PM John Roesler <vvcep...@apache.org> wrote: > Hi Jorge, > > Thanks for the update. I think this is a good plan. > > I just took a look at the KIP again, and it looks like there’s a revision > error in which two methods are proposed for SessionStore, but seem like > they should be in ReadOnlySessionStore. Do I read that right? > > Otherwise, I’m happy with your proposal. > > Thanks, > John > > On Wed, Jul 1, 2020, at 17:01, Jorge Esteban Quilcate Otoya wrote: > > Quick update: KIP is updated with latest changes now. > > Will leave it open this week while working on the PR. > > > > Hope to open a new vote thread over the next few days if no additional > > feedback is provided. > > > > Cheers, > > Jorge. > > > > On Mon, Jun 29, 2020 at 11:30 AM Jorge Esteban Quilcate Otoya < > > quilcate.jo...@gmail.com> wrote: > > > > > Thanks, John! > > > > > > Make sense to reconsider the current approach. I was heading in a > similar > > > direction while drafting the implementation. Metered, Caching, and > other > > > layers will also have to get duplicated to build up new methods in > `Stores` > > > factory, and class casting issues would appear on stores created by > DSL. > > > > > > I will draft a proposal with new methods (move methods from proposed > > > interfaces to existing ones) with default implementation in a KIP > update > > > and wait for Matthias to chime in to validate this approach. > > > > > > Jorge. > > > > > > > > > On Sat, Jun 27, 2020 at 4:01 PM John Roesler <vvcep...@apache.org> > wrote: > > > > > >> Hi Jorge, > > >> > > >> Sorry for my silence, I've been absorbed with the 2.6 and 2.5.1 > releases. > > >> > > >> The idea to separate the new methods into "mixin" interfaces seems > > >> like a good one, but as we've discovered in KIP-614, it doesn't work > > >> out that way in practice. The problem is that the store > implementations > > >> are just the base layer that get composed with other layers in Streams > > >> before they can be accessed in the DSL. This is extremely subtle, so > > >> I'm going to put everyone to sleep with a detailed explanation: > > >> > > >> For example, this is the mechanism by which all KeyValueStore > > >> implementations get added to Streams: > > >> org.apache.kafka.streams.state.internals.KeyValueStoreBuilder#build > > >> return new MeteredKeyValueStore<>( > > >> maybeWrapCaching(maybeWrapLogging(storeSupplier.get())), > > >> storeSupplier.metricsScope(), > > >> time, > > >> keySerde, > > >> valueSerde > > >> ); > > >> > > >> In the DSL, the store that a processor gets from the context would be > > >> the result of this composition. So even if the storeSupplier.get() > returns > > >> a store that implements the "reverse" interface, when you try to use > it > > >> from a processor like: > > >> org.apache.kafka.streams.kstream.ValueTransformerWithKey#init > > >> ReadOnlyBackwardWindowStore<K, V> store = > > >> (ReadOnlyBackwardWindowStore<K, V>) context.getStateStore(..) > > >> > > >> You'd just get a ClassCastException because it's actually a > > >> MeteredKeyValueStore, which doesn't implement > > >> ReadOnlyBackwardWindowStore. > > >> > > >> The only way to make this work would be to make the Metered, > > >> Caching, and Logging layers also implement the new interfaces, > > >> but this effectively forces implementations to also implement > > >> the interface. Otherwise, the intermediate layers would have to > > >> cast the store in each method, like this: > > >> MeteredWindowStore#backwardFetch { > > >> ((ReadOnlyBackwardWindowStore<K, V>) innerStore).backwardFetch(..) > > >> } > > >> > > >> And then if the implementation doesn't "opt in" by implementing > > >> the interface, you'd get a ClassCastException, not when you get the > > >> store, but when you try to use it. > > >> > > >> The fact that we get ClassCastExceptions no matter which way we > > >> turn here indicates that we're really not getting any benefit from the > > >> type system, which makes the extra interfaces seem not worth all the > > >> code involved. > > >> > > >> Where we landed in KIP-614 is that, unless we want to completely > > >> revamp the way that StateStores work in the DSL, you might as > > >> well just add the new methods to the existing interfaces. To prevent > > >> compilation errors, we can add default implementations that throw > > >> UnsupportedOperationException. If a store doesn't opt in by > > >> implementing the methods, you'd get an UnsupportedOperationException, > > >> which seems no worse, and maybe better, than the ClassCastException > > >> you'd get if we go with the "mixin interface" approach. > > >> > > >> A quick note: This entire discussion focuses on the DSL. If you're > just > > >> using the Processor API by directly adding the a custom store to the > > >> Topology: > > >> org.apache.kafka.streams.Topology#addStateStore > > >> and then retrieving it in the processor via: > > >> org.apache.kafka.streams.processor.ProcessorContext#getStateStore > > >> in > > >> org.apache.kafka.streams.processor.Processor#init > > >> > > >> Then, you can both register and retrieve _any_ StateStore > implementation. > > >> There's no need to use KeyValueStore or any other built-in interface. > > >> In other words, KeyValueStore and company are only part of the DSL, > > >> not the PAPI. So, discussions about the build-in store interfaces are > only > > >> really relevant in the context of the DSL, Transformers, and > Materialized. > > >> > > >> So, in conclusion, I'd really recommend just adding any new methods to > > >> the existing store interfaces. We might be able to revamp the API in > the > > >> future to support mixins, but it's a much larger scope of work than > this > > >> KIP. > > >> A more minor comment is that we don't need to add Deprecated variants > > >> of new methods. > > >> > > >> Thanks again, and once again, I'm sorry I tuned out and didn't offer > this > > >> feedback before you revised the KIP. > > >> -John > > >> > > >> > > >> > > >> > > >> On Mon, Jun 22, 2020, at 06:11, Jorge Esteban Quilcate Otoya wrote: > > >> > Hi everyone, > > >> > > > >> > I've updated the KIP, applying Matthias' feedback regarding > interface > > >> > hierarchy. > > >> > > > >> > Also, following the last email, I think we can consider reverse > > >> operations > > >> > on KeyValue range as well, as implementation supports lexicographic > > >> order. > > >> > > > >> > I considered different naming between Key-based ranges and > Time-based > > >> > ranges, and mitigate confusion when fetching keys and time ranges as > > >> > WindowStore does: > > >> > > > >> > Key-based ranges: reverseRange(), reverseAll() > > >> > Time-based ranges: backwardFetch() > > >> > > > >> > Then, key-based changes apply to KeyValueStore, and time-based > changes > > >> to > > >> > Window and Session stores. > > >> > > > >> > Let me know if you have any questions. > > >> > > > >> > Thanks, > > >> > Jorge. > > >> > > > >> > > > >> > On Tue, Jun 16, 2020 at 12:47 AM Jorge Esteban Quilcate Otoya < > > >> > quilcate.jo...@gmail.com> wrote: > > >> > > > >> > > Hi everyone, sorry for the late reply. > > >> > > > > >> > > Thanks Matthias for your feedback. I think it makes sense to > > >> reconsider > > >> > > the current design based on your input. > > >> > > > > >> > > After digging deeper into the current implementation, I'd like to > > >> bring my > > >> > > current understanding to be double-checked as it might be > redefining > > >> the > > >> > > KIP's scope: > > >> > > > > >> > > 1. There are 2 ranges been exposed by different stores: > > >> > > > > >> > > a. Key Range > > >> > > b. Timestamp Range > > >> > > > > >> > > So far, we have discussed covering both. > > >> > > > > >> > > 2. Key Range functions do not provide ordering guarantees by > design: > > >> > > > > >> > > ```ReadOnlyKeyValueStore.java > > >> > > /** > > >> > > * Get an iterator over a given range of keys. This iterator > must > > >> be > > >> > > closed after use. > > >> > > * The returned iterator must be safe from {@link > > >> > > java.util.ConcurrentModificationException}s > > >> > > * and must not return null values. No ordering guarantees are > > >> > > provided. > > >> > > * ... > > >> > > */ > > >> > > KeyValueIterator<K, V> range(K from, K to); > > >> > > ``` > > >> > > > > >> > > Therefore, I'd propose removing Key range operations from the > scope. > > >> > > > > >> > > 3. Timestamp Range operations happen at the SegmentsStore level > > >> (internal) > > >> > > API > > >> > > > > >> > > AFAICT, Segments wrappers handle all Timestamp ranges queries. > > >> > > > > >> > > I'd propose extending `Segments#segments(timeFrom, timeTo, > backwards)` > > >> > > with a flag for backwards operations. > > >> > > > > >> > > As segments returned will be processed backwards, I'm not > extending > > >> > > KeyValueStores to query each segment backwards as previous point > 2. > > >> > > > > >> > > 4. Extend WindowStores implementations with a new > > >> > > WindowBackwardStore/ReadOnlyBackwardStore: > > >> > > > > >> > > ```java > > >> > > public interface ReadOnlyWindowBackwardStore<K, V> { > > >> > > WindowStoreIterator<V> backwardFetch(K key, Instant from, > Instant > > >> to) > > >> > > throws IllegalArgumentException; > > >> > > > > >> > > KeyValueIterator<Windowed<K>, V> backwardFetch(K from, K to, > > >> Instant > > >> > > fromTime, Instant toTime) > > >> > > throws IllegalArgumentException; > > >> > > > > >> > > KeyValueIterator<Windowed<K>, V> backwardFetchAll(Instant > from, > > >> > > Instant to) throws IllegalArgumentException; > > >> > > ``` > > >> > > > > >> > > 5. SessionStore is a bit different as it has fetch/find sessions > > >> spread > > >> > > between SessionStore and ReadOnlySessionStore. > > >> > > > > >> > > I'd propose a new interface `SessionBackwardStore` to expose > backward > > >> find > > >> > > operations: > > >> > > > > >> > > ```java > > >> > > public interface SessionBackwardStore<K, AGG> { > > >> > > KeyValueIterator<Windowed<K>, AGG> backwardFindSessions(final > K > > >> key, > > >> > > final long earliestSessionEndTime, final long > latestSessionStartTime); > > >> > > > > >> > > KeyValueIterator<Windowed<K>, AGG> backwardFindSessions(final > K > > >> > > keyFrom, final K keyTo, final long earliestSessionEndTime, final > long > > >> > > latestSessionStartTime); > > >> > > } > > >> > > ``` > > >> > > > > >> > > If this understanding is correct I'll proceed to update the KIP > based > > >> on > > >> > > this. > > >> > > > > >> > > Looking forward to your feedback. > > >> > > > > >> > > Thanks, > > >> > > Jorge. > > >> > > > > >> > > On Fri, May 29, 2020 at 3:32 AM Matthias J. Sax <mj...@apache.org > > > > >> wrote: > > >> > > > > >> > >> Hey, > > >> > >> > > >> > >> Sorry that I am late to the game. I am not 100% convinced about > the > > >> > >> current proposal. Using a new config as feature flag seems to be > > >> rather > > >> > >> "nasty" to me, and flipping from/to is a little bit too fancy > for my > > >> > >> personal taste. > > >> > >> > > >> > >> I agree, that the original proposal using a "ReadDirection" enum > is > > >> not > > >> > >> ideal either. > > >> > >> > > >> > >> Thus, I would like to put out a new idea: We could add a new > > >> interface > > >> > >> that offers new methods that return revers iterators. > > >> > >> > > >> > >> The KIP already proposes to add `reverseAll()` and it seems > backward > > >> > >> incompatible to just add this method to `ReadOnlyKeyValueStore` > and > > >> > >> `ReadOnlyWindowStore`. I don't think we could provide a useful > > >> default > > >> > >> implementation for custom stores and thus either break > compatibility > > >> or > > >> > >> need add a default that just throws an exception. Neither seems > to > > >> be a > > >> > >> good option. > > >> > >> > > >> > >> Using a new interface avoid this issue and allows users > implementing > > >> > >> custom stores to opt-in by adding the interface to their stores. > > >> > >> Furthermore, we don't need any config. In the end, we encapsulte > the > > >> > >> change into the store, and our runtime is agnostic to it (as it > > >> should > > >> > >> be). > > >> > >> > > >> > >> The hierarchy becomes a little complex (but uses would not > really see > > >> > >> the complexity): > > >> > >> > > >> > >> // exsiting > > >> > >> ReadOnlyKeyValueStore > > >> > >> KeyValueStore extend StateStore, ReadOnlyKeyValueStore > > >> > >> > > >> > >> > > >> > >> // helper interface; users don't care > > >> > >> // need similar ones for other stores > > >> > >> ReverseReadOnlyKeyValueStore { > > >> > >> KeyValueIterator<K, V> reverseRange(K from, K to); > > >> > >> KeyValueIterator<K, V> reverseAll(); > > >> > >> } > > >> > >> > > >> > >> > > >> > >> // two new user facing interfaces for kv-store > > >> > >> // need similar ones for other stores > > >> > >> ReadOnlyKeyValueStoreWithReverseIterators extends > > >> ReadOnlyKeyValueStore, > > >> > >> ReverseReadOnlyKeyValueStore > > >> > >> > > >> > >> KeyValueStoreWithReverseIterators extends KeyValueStore, > > >> > >> ReverseReadOnlyKeyValueStore > > >> > >> > > >> > >> > > >> > >> // updated (also internal) > > >> > >> // also need to update other built-in stores > > >> > >> RocksDB implements KeyValueStoreWithReverseIterators, > > >> BulkLoadingStore > > >> > >> > > >> > >> > > >> > >> In the end, user would only care about the two (for kv-case) new > > >> > >> interface that offer revers iterator (read only and regular) and > can > > >> > >> cast stores accordingly in their Processors/Transformers or via > IQ. > > >> > >> > > >> > >> > > >> > >> Btw: if we add revers iterator for KeyValue and Window store, > should > > >> we > > >> > >> do the same for Session store? > > >> > >> > > >> > >> > > >> > >> > > >> > >> This might be more code to write, but I believe it provides the > > >> better > > >> > >> user experience. Thoughts? > > >> > >> > > >> > >> > > >> > >> > > >> > >> -Matthias > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> On 5/26/20 8:47 PM, John Roesler wrote: > > >> > >> > Sorry for my silence, Jorge, > > >> > >> > > > >> > >> > I've just taken another look, and I'm personally happy with the > > >> KIP. > > >> > >> > > > >> > >> > Thanks, > > >> > >> > -John > > >> > >> > > > >> > >> > On Tue, May 26, 2020, at 16:17, Jorge Esteban Quilcate Otoya > wrote: > > >> > >> >> If no additional comments, I will proceed to start the a vote > > >> thread. > > >> > >> >> > > >> > >> >> Thanks a lot for your feedback! > > >> > >> >> > > >> > >> >> On Fri, May 22, 2020 at 9:25 AM Jorge Esteban Quilcate Otoya < > > >> > >> >> quilcate.jo...@gmail.com> wrote: > > >> > >> >> > > >> > >> >>> Thanks Sophie. I like the `reverseAll()` idea. > > >> > >> >>> > > >> > >> >>> I updated the KIP with your feedback. > > >> > >> >>> > > >> > >> >>> > > >> > >> >>> > > >> > >> >>> On Fri, May 22, 2020 at 4:22 AM Sophie Blee-Goldman < > > >> > >> sop...@confluent.io> > > >> > >> >>> wrote: > > >> > >> >>> > > >> > >> >>>> Hm, the case of `all()` does seem to present a dilemma in > the > > >> case of > > >> > >> >>>> variable-length keys. > > >> > >> >>>> > > >> > >> >>>> In the case of fixed-length keys, you can just compute the > keys > > >> that > > >> > >> >>>> correspond > > >> > >> >>>> to the maximum and minimum serialized bytes, then perform a > > >> `range()` > > >> > >> >>>> query > > >> > >> >>>> instead of an `all()`. If your keys don't have a > well-defined > > >> > >> ordering > > >> > >> >>>> such > > >> > >> >>>> that > > >> > >> >>>> you can't determine the MAX_KEY, then you probably don't > care > > >> about > > >> > >> the > > >> > >> >>>> iterator order anyway. > > >> > >> >>>> > > >> > >> >>>> But with variable-length keys, there is no MAX_KEY. If all > your > > >> > >> keys were > > >> > >> >>>> just > > >> > >> >>>> of the form 'a', 'aa', 'aaaaa', 'aaaaaaa' then in fact the > only > > >> way > > >> > >> to > > >> > >> >>>> figure out the > > >> > >> >>>> maximum key in the store is by using `all()` -- and without > a > > >> reverse > > >> > >> >>>> iterator, you're > > >> > >> >>>> doomed to iterate through every single key just to answer > that > > >> simple > > >> > >> >>>> question. > > >> > >> >>>> > > >> > >> >>>> That said, I still think determining the iterator order > based > > >> on the > > >> > >> >>>> to/from bytes > > >> > >> >>>> makes a lot of intuitive sense and gives the API a nice > > >> symmetry. > > >> > >> What if > > >> > >> >>>> we > > >> > >> >>>> solved the `all()` problem by just giving `all()` a reverse > > >> form to > > >> > >> >>>> complement it? > > >> > >> >>>> Ie we would have `all()` and `reverseAll()`, or something to > > >> that > > >> > >> effect. > > >> > >> >>>> > > >> > >> >>>> On Thu, May 21, 2020 at 3:41 PM Jorge Esteban Quilcate > Otoya < > > >> > >> >>>> quilcate.jo...@gmail.com> wrote: > > >> > >> >>>> > > >> > >> >>>>> Thanks John. > > >> > >> >>>>> > > >> > >> >>>>> Agree. I like the first approach as well, with > StreamsConfig > > >> flag > > >> > >> >>>> passing > > >> > >> >>>>> by via ProcessorContext. > > >> > >> >>>>> > > >> > >> >>>>> Another positive effect with "reverse parameters" is that > in > > >> the > > >> > >> case of > > >> > >> >>>>> `fetch(keyFrom, keyTo, timeFrom, timeTo)` users can decide > > >> _which_ > > >> > >> pair > > >> > >> >>>> to > > >> > >> >>>>> flip, whether with `ReadDirection` enum it apply to both. > > >> > >> >>>>> > > >> > >> >>>>> The only issue I've found while reviewing the KIP is that > > >> `all()` > > >> > >> won't > > >> > >> >>>> fit > > >> > >> >>>>> within this approach. > > >> > >> >>>>> > > >> > >> >>>>> We could remove it from the KIP and argue that for > WindowStore, > > >> > >> >>>>> `fetchAll(0, Long.MAX_VALUE)` can be used to get all in > reverse > > >> > >> order, > > >> > >> >>>> and > > >> > >> >>>>> for KeyValueStore, no ordering guarantees are provided. > > >> > >> >>>>> > > >> > >> >>>>> If there is consensus with this changes, I will go and > update > > >> the > > >> > >> KIP. > > >> > >> >>>>> > > >> > >> >>>>> On Thu, May 21, 2020 at 3:33 PM John Roesler < > > >> vvcep...@apache.org> > > >> > >> >>>> wrote: > > >> > >> >>>>> > > >> > >> >>>>>> Hi Jorge, > > >> > >> >>>>>> > > >> > >> >>>>>> Thanks for that idea. I agree, a feature flag would > protect > > >> anyone > > >> > >> >>>>>> who may be depending on the current behavior. > > >> > >> >>>>>> > > >> > >> >>>>>> It seems better to locate the feature flag in the > > >> initialization > > >> > >> >>>> logic of > > >> > >> >>>>>> the store, rather than have a method on the "live" store > that > > >> > >> changes > > >> > >> >>>>>> its behavior on the fly. > > >> > >> >>>>>> > > >> > >> >>>>>> It seems like there are two options here, one is to add a > new > > >> > >> config: > > >> > >> >>>>>> > > >> > >> >>>>>> StreamsConfig.ENABLE_BACKWARDS_ITERATION = > > >> > >> >>>>>> "enable.backwards.iteration > > >> > >> >>>>>> > > >> > >> >>>>>> Or we can add a feature flag in Materialized, like > > >> > >> >>>>>> > > >> > >> >>>>>> Materialized.enableBackwardsIteration() > > >> > >> >>>>>> > > >> > >> >>>>>> I think I'd personally lean toward the config, for the > > >> following > > >> > >> >>>> reason. > > >> > >> >>>>>> The concern that Sophie raised is that someone's program > may > > >> depend > > >> > >> >>>>>> on the existing contract of getting an empty iterator. We > > >> don't > > >> > >> want > > >> > >> >>>> to > > >> > >> >>>>>> switch behavior when they aren't expecting it, so we > provide > > >> them a > > >> > >> >>>>>> config to assert that they _are_ expecting the new > behavior, > > >> which > > >> > >> >>>>>> means they take responsibility for updating their code to > > >> expect > > >> > >> the > > >> > >> >>>> new > > >> > >> >>>>>> behavior. > > >> > >> >>>>>> > > >> > >> >>>>>> There doesn't seem to be a reason to offer a choice of > > >> behaviors > > >> > >> on a > > >> > >> >>>>>> per-query, or per-store basis. We just want people to be > not > > >> > >> surprised > > >> > >> >>>>>> by this change in general. > > >> > >> >>>>>> > > >> > >> >>>>>> What do you think? > > >> > >> >>>>>> Thanks, > > >> > >> >>>>>> -John > > >> > >> >>>>>> > > >> > >> >>>>>> On Wed, May 20, 2020, at 17:37, Jorge Quilcate wrote: > > >> > >> >>>>>>> Thank you both for the great feedback. > > >> > >> >>>>>>> > > >> > >> >>>>>>> I like the "fancy" proposal :), and how it removes the > need > > >> for > > >> > >> >>>>>>> additional API methods. And with a feature flag on > > >> `StateStore`, > > >> > >> >>>>>>> disabled by default, should no break current users. > > >> > >> >>>>>>> > > >> > >> >>>>>>> The only side-effect I can think of is that: by moving > the > > >> flag > > >> > >> >>>>> upwards, > > >> > >> >>>>>>> all later operations become affected; which might be ok > for > > >> most > > >> > >> >>>> (all?) > > >> > >> >>>>>>> cases. I can't think of an scenario where this would be > an > > >> issue, > > >> > >> >>>> just > > >> > >> >>>>>>> want to point this out. > > >> > >> >>>>>>> > > >> > >> >>>>>>> If moving to this approach, I'd like to check if I got > this > > >> right > > >> > >> >>>>> before > > >> > >> >>>>>>> updating the KIP: > > >> > >> >>>>>>> > > >> > >> >>>>>>> - only `StateStore` will change by having a new method: > > >> > >> >>>>>>> `backwardIteration()`, `false` by default to keep things > > >> > >> compatible. > > >> > >> >>>>>>> - then all `*Stores` will have to update their > implementation > > >> > >> based > > >> > >> >>>> on > > >> > >> >>>>>>> this flag. > > >> > >> >>>>>>> > > >> > >> >>>>>>> > > >> > >> >>>>>>> On 20/05/2020 21:02, Sophie Blee-Goldman wrote: > > >> > >> >>>>>>>>> There's no possibility that someone could be relying > > >> > >> >>>>>>>>> on iterating over that range in increasing order, > because > > >> that's > > >> > >> >>>> not > > >> > >> >>>>>> what > > >> > >> >>>>>>>>> happens. However, they could indeed be relying on > getting > > >> an > > >> > >> >>>> empty > > >> > >> >>>>>>>> iterator > > >> > >> >>>>>>>> > > >> > >> >>>>>>>> I just meant that they might be relying on the > assumption > > >> that > > >> > >> the > > >> > >> >>>>>> range > > >> > >> >>>>>>>> query > > >> > >> >>>>>>>> will never return results with decreasing keys. The > empty > > >> > >> iterator > > >> > >> >>>>>> wouldn't > > >> > >> >>>>>>>> break that contract, but of course a surprise reverse > > >> iterator > > >> > >> >>>> would. > > >> > >> >>>>>>>> > > >> > >> >>>>>>>> FWIW I actually am in favor of automatically converting > to a > > >> > >> >>>> reverse > > >> > >> >>>>>>>> iterator, > > >> > >> >>>>>>>> I just thought we should consider whether this should be > > >> off by > > >> > >> >>>>>> default or > > >> > >> >>>>>>>> even possible to disable at all. > > >> > >> >>>>>>>> > > >> > >> >>>>>>>> On Tue, May 19, 2020 at 7:42 PM John Roesler < > > >> > >> vvcep...@apache.org > > >> > >> >>>>> > > >> > >> >>>>>> wrote: > > >> > >> >>>>>>>> > > >> > >> >>>>>>>>> Thanks for the response, Sophie, > > >> > >> >>>>>>>>> > > >> > >> >>>>>>>>> I wholeheartedly agree we should take as much into > account > > >> as > > >> > >> >>>>> possible > > >> > >> >>>>>>>>> up front, rather than regretting our decisions later. I > > >> actually > > >> > >> >>>> do > > >> > >> >>>>>> share > > >> > >> >>>>>>>>> your vague sense of worry, which was what led me to say > > >> > >> initially > > >> > >> >>>>>> that I > > >> > >> >>>>>>>>> thought my counterproposal might be "too fancy". > > >> Sometimes, it's > > >> > >> >>>>>> better > > >> > >> >>>>>>>>> to be explicit instead of "elegant", if we think more > > >> people > > >> > >> >>>> will be > > >> > >> >>>>>>>>> confused > > >> > >> >>>>>>>>> than not. > > >> > >> >>>>>>>>> > > >> > >> >>>>>>>>> I really don't think that there's any danger of > "relying > > >> on a > > >> > >> >>>> bug" > > >> > >> >>>>>> here, > > >> > >> >>>>>>>>> although > > >> > >> >>>>>>>>> people certainly could be relying on current behavior. > One > > >> thing > > >> > >> >>>> to > > >> > >> >>>>> be > > >> > >> >>>>>>>>> clear > > >> > >> >>>>>>>>> about (which I just left a more detailed comment in > > >> KAFKA-8159 > > >> > >> >>>>> about) > > >> > >> >>>>>> is > > >> > >> >>>>>>>>> that > > >> > >> >>>>>>>>> when we say something like key1 > key2, this ordering > is > > >> defined > > >> > >> >>>> by > > >> > >> >>>>>> the > > >> > >> >>>>>>>>> serde's output and nothing else. > > >> > >> >>>>>>>>> > > >> > >> >>>>>>>>> Currently, thanks to your fix in > > >> > >> >>>>>> https://github.com/apache/kafka/pull/6521 > > >> > >> >>>>>>>>> , > > >> > >> >>>>>>>>> the store contract is that for range scans, if from > > to, > > >> then > > >> > >> >>>> the > > >> > >> >>>>>> store > > >> > >> >>>>>>>>> must > > >> > >> >>>>>>>>> return an empty iterator. There's no possibility that > > >> someone > > >> > >> >>>> could > > >> > >> >>>>> be > > >> > >> >>>>>>>>> relying > > >> > >> >>>>>>>>> on iterating over that range in increasing order, > because > > >> that's > > >> > >> >>>> not > > >> > >> >>>>>> what > > >> > >> >>>>>>>>> happens. However, they could indeed be relying on > getting > > >> an > > >> > >> >>>> empty > > >> > >> >>>>>>>>> iterator. > > >> > >> >>>>>>>>> > > >> > >> >>>>>>>>> My counterproposal was to actually change this > contract to > > >> say > > >> > >> >>>> that > > >> > >> >>>>>> the > > >> > >> >>>>>>>>> store > > >> > >> >>>>>>>>> must return an iterator over the keys in that range, > but > > >> in the > > >> > >> >>>>>> reverse > > >> > >> >>>>>>>>> order. > > >> > >> >>>>>>>>> So, in addition to considering whether this idea is > "too > > >> fancy" > > >> > >> >>>> (aka > > >> > >> >>>>>>>>> confusing), > > >> > >> >>>>>>>>> we should also consider the likelihood of breaking an > > >> existing > > >> > >> >>>>>> program with > > >> > >> >>>>>>>>> this behavior/contract change. > > >> > >> >>>>>>>>> > > >> > >> >>>>>>>>> To echo your clarification, I'm also not advocating > > >> strongly in > > >> > >> >>>>> favor > > >> > >> >>>>>> of my > > >> > >> >>>>>>>>> proposal. I just wanted to present it for consideration > > >> > >> alongside > > >> > >> >>>>>> Jorge's > > >> > >> >>>>>>>>> original one. > > >> > >> >>>>>>>>> > > >> > >> >>>>>>>>> Thanks for raising these very good points, > > >> > >> >>>>>>>>> -John > > >> > >> >>>>>>>>> > > >> > >> >>>>>>>>> On Tue, May 19, 2020, at 20:49, Sophie Blee-Goldman > wrote: > > >> > >> >>>>>>>>>>> Rather than working around it, I think we should just > > >> fix it > > >> > >> >>>>>>>>>> Now *that's* a "fancy" idea :P > > >> > >> >>>>>>>>>> > > >> > >> >>>>>>>>>> That was my primary concern, although I do have a > vague > > >> sense > > >> > >> of > > >> > >> >>>>>> worry > > >> > >> >>>>>>>>>> that we might be allowing users to get into trouble > > >> without > > >> > >> >>>>>> realizing it. > > >> > >> >>>>>>>>>> For example if their custom serdes suffer a similar > bug > > >> as the > > >> > >> >>>>> above, > > >> > >> >>>>>>>>>> and/or > > >> > >> >>>>>>>>>> they rely on getting results in increasing order (of > the > > >> keys) > > >> > >> >>>> even > > >> > >> >>>>>> when > > >> > >> >>>>>>>>>> to < from. Maybe they're relying on the fact that the > > >> range > > >> > >> >>>> query > > >> > >> >>>>>> returns > > >> > >> >>>>>>>>>> nothing in that case. > > >> > >> >>>>>>>>>> > > >> > >> >>>>>>>>>> Not sure if that qualifies as relying on a bug or not, > > >> but in > > >> > >> >>>> that > > >> > >> >>>>>> past > > >> > >> >>>>>>>>>> we've > > >> > >> >>>>>>>>>> taken the stance that we should not break > compatibility > > >> even if > > >> > >> >>>> the > > >> > >> >>>>>> user > > >> > >> >>>>>>>>>> was relying on bugs or unintentional behavior. > > >> > >> >>>>>>>>>> > > >> > >> >>>>>>>>>> Just to clarify I'm not advocating strongly against > this > > >> > >> >>>> proposal, > > >> > >> >>>>>> just > > >> > >> >>>>>>>>>> laying > > >> > >> >>>>>>>>>> out some considerations we should take into account. > At > > >> the end > > >> > >> >>>> of > > >> > >> >>>>>> the > > >> > >> >>>>>>>>> day > > >> > >> >>>>>>>>>> we should do what's right rather than maintain > > >> compatibility > > >> > >> >>>> with > > >> > >> >>>>>>>>> existing > > >> > >> >>>>>>>>>> bugs, but sometimes there's a reasonable middle > ground. > > >> > >> >>>>>>>>>> > > >> > >> >>>>>>>>>> On Tue, May 19, 2020 at 6:15 PM John Roesler < > > >> > >> >>>> vvcep...@apache.org> > > >> > >> >>>>>>>>> wrote: > > >> > >> >>>>>>>>>>> Thanks Sophie, > > >> > >> >>>>>>>>>>> > > >> > >> >>>>>>>>>>> Woah, that’s a nasty bug. Rather than working around > it, > > >> I > > >> > >> >>>> think > > >> > >> >>>>> we > > >> > >> >>>>>>>>> should > > >> > >> >>>>>>>>>>> just fix it. I’ll leave some comments on the Jira. > > >> > >> >>>>>>>>>>> > > >> > >> >>>>>>>>>>> It doesn’t seem like it should be this KIP’s concern > > >> that some > > >> > >> >>>>>> serdes > > >> > >> >>>>>>>>>>> might be incorrectly written. > > >> > >> >>>>>>>>>>> > > >> > >> >>>>>>>>>>> Were there other practical concerns that you had in > mind? > > >> > >> >>>>>>>>>>> > > >> > >> >>>>>>>>>>> Thanks, > > >> > >> >>>>>>>>>>> John > > >> > >> >>>>>>>>>>> > > >> > >> >>>>>>>>>>> On Tue, May 19, 2020, at 19:10, Sophie Blee-Goldman > > >> wrote: > > >> > >> >>>>>>>>>>>> I like this "fancy idea" to just flip the to/from > bytes > > >> but I > > >> > >> >>>>> think > > >> > >> >>>>>>>>> there > > >> > >> >>>>>>>>>>>> are some practical limitations to implementing > this. In > > >> > >> >>>>> particular > > >> > >> >>>>>>>>>>>> I'm thinking about this issue > > >> > >> >>>>>>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-8159> > > >> with the > > >> > >> >>>>>> built-in > > >> > >> >>>>>>>>>>> signed > > >> > >> >>>>>>>>>>>> number serdes. > > >> > >> >>>>>>>>>>>> > > >> > >> >>>>>>>>>>>> This trick would actually fix the problem for > > >> > >> >>>> negative-negative > > >> > >> >>>>>>>>> queries > > >> > >> >>>>>>>>>>>> (ie where to & from are negative) but would cause > > >> > >> undetectable > > >> > >> >>>>>>>>>>>> incorrect results for negative-positive queries. For > > >> example, > > >> > >> >>>> say > > >> > >> >>>>>> you > > >> > >> >>>>>>>>>>>> call #range with from = -1 and to = 1, using the > Short > > >> > >> serdes. > > >> > >> >>>>> The > > >> > >> >>>>>>>>>>>> serialized bytes for that are > > >> > >> >>>>>>>>>>>> > > >> > >> >>>>>>>>>>>> from = 1111111111111111 > > >> > >> >>>>>>>>>>>> to = 0000000000000001 > > >> > >> >>>>>>>>>>>> > > >> > >> >>>>>>>>>>>> so we would end up flipping those and iterating over > > >> all keys > > >> > >> >>>>> from > > >> > >> >>>>>>>>>>>> 0000000000000001 to 1111111111111111. Iterating in > > >> > >> >>>>> lexicographical > > >> > >> >>>>>>>>>>>> order means we would iterate over every key in the > space > > >> > >> >>>> *except* > > >> > >> >>>>>> for > > >> > >> >>>>>>>>>>>> 0, but 0 is actually the *only* other key we meant > to be > > >> > >> >>>> included > > >> > >> >>>>>> in > > >> > >> >>>>>>>>> the > > >> > >> >>>>>>>>>>>> range query. > > >> > >> >>>>>>>>>>>> > > >> > >> >>>>>>>>>>>> Currently we just log a warning and return an empty > > >> iterator > > >> > >> >>>> when > > >> > >> >>>>>>>>>>>> to < from, which is obviously also incorrect but > feels > > >> > >> >>>> slightly > > >> > >> >>>>>> more > > >> > >> >>>>>>>>>>>> palatable. If we start automatically converting to > > >> reverse > > >> > >> >>>>> queries > > >> > >> >>>>>> we > > >> > >> >>>>>>>>>>>> can't even log a warning in this case unless we > wanted > > >> to log > > >> > >> >>>> a > > >> > >> >>>>>>>>> warning > > >> > >> >>>>>>>>>>>> every time, which would be weird to do for a valid > > >> usage of a > > >> > >> >>>> new > > >> > >> >>>>>>>>>>>> feature. > > >> > >> >>>>>>>>>>>> > > >> > >> >>>>>>>>>>>> All that said, I still like the idea overall. Off > the > > >> top of > > >> > >> >>>> my > > >> > >> >>>>>> head > > >> > >> >>>>>>>>> I > > >> > >> >>>>>>>>>>> guess > > >> > >> >>>>>>>>>>>> we could add a store config to enable/disable > automatic > > >> > >> >>>> reverse > > >> > >> >>>>>>>>>>> iteration, > > >> > >> >>>>>>>>>>>> which is off by default? > > >> > >> >>>>>>>>>>>> > > >> > >> >>>>>>>>>>>> Thanks for the KIP! This will be a nice addition > > >> > >> >>>>>>>>>>>> > > >> > >> >>>>>>>>>>>> Sophie > > >> > >> >>>>>>>>>>>> > > >> > >> >>>>>>>>>>>> > > >> > >> >>>>>>>>>>>> On Tue, May 19, 2020 at 3:21 PM John Roesler < > > >> > >> >>>>> vvcep...@apache.org> > > >> > >> >>>>>>>>>>> wrote: > > >> > >> >>>>>>>>>>>>> Hi there Jorge, > > >> > >> >>>>>>>>>>>>> > > >> > >> >>>>>>>>>>>>> Thanks for the KIP! > > >> > >> >>>>>>>>>>>>> > > >> > >> >>>>>>>>>>>>> I think this feature sounds very reasonable. > > >> > >> >>>>>>>>>>>>> > > >> > >> >>>>>>>>>>>>> I'm not 100% sure if this is "too fancy", but what > do > > >> you > > >> > >> >>>> think > > >> > >> >>>>>>>>>>>>> about avoiding the enum by instead allowing people > to > > >> flip > > >> > >> >>>>>>>>>>>>> the "from" and "to" endpoints? I.e., reading from > "A" > > >> to "Z" > > >> > >> >>>>> would > > >> > >> >>>>>>>>>>>>> be a forward scan, and from "Z" to "A" would be a > > >> backward > > >> > >> >>>> one? > > >> > >> >>>>>>>>>>>>> > > >> > >> >>>>>>>>>>>>> Thanks, > > >> > >> >>>>>>>>>>>>> -John > > >> > >> >>>>>>>>>>>>> > > >> > >> >>>>>>>>>>>>> On Tue, May 19, 2020, at 16:20, Jorge Quilcate > wrote: > > >> > >> >>>>>>>>>>>>>> Hi everyone, > > >> > >> >>>>>>>>>>>>>> > > >> > >> >>>>>>>>>>>>>> I would like to start the discussion for KIP-617: > > >> > >> >>>>>>>>>>>>>> > > >> > >> >>>>>>>>> > > >> > >> >>>>>> > > >> > >> >>>>> > > >> > >> >>>> > > >> > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-617%3A+Allow+Kafka+Streams+State+Stores+to+be+iterated+backwards > > >> > >> >>>>>>>>>>>>>> Looking forward to your feedback. > > >> > >> >>>>>>>>>>>>>> > > >> > >> >>>>>>>>>>>>>> Thanks! > > >> > >> >>>>>>>>>>>>>> Jorge. > > >> > >> >>>>>>>>>>>>>> > > >> > >> >>>>>>>>>>>>>> > > >> > >> >>>>>>>>>>>>>> Attachments: > > >> > >> >>>>>>>>>>>>>> * 0x5F2C6E22064982DF.asc > > >> > >> >>>>>>> > > >> > >> >>>>>>> > > >> > >> >>>>>>> Attachments: > > >> > >> >>>>>>> * 0x5F2C6E22064982DF.asc > > >> > >> >>>>>> > > >> > >> >>>>> > > >> > >> >>>> > > >> > >> >>> > > >> > >> >> > > >> > >> > > >> > >> > > >> > > > >> > > > > > >