Sorry for my silence, Jorge, I've just taken another look, and I'm personally happy with the KIP.
Thanks, -John On Tue, May 26, 2020, at 16:17, Jorge Esteban Quilcate Otoya wrote: > If no additional comments, I will proceed to start the a vote thread. > > Thanks a lot for your feedback! > > On Fri, May 22, 2020 at 9:25 AM Jorge Esteban Quilcate Otoya < > quilcate.jo...@gmail.com> wrote: > > > Thanks Sophie. I like the `reverseAll()` idea. > > > > I updated the KIP with your feedback. > > > > > > > > On Fri, May 22, 2020 at 4:22 AM Sophie Blee-Goldman <sop...@confluent.io> > > wrote: > > > >> Hm, the case of `all()` does seem to present a dilemma in the case of > >> variable-length keys. > >> > >> In the case of fixed-length keys, you can just compute the keys that > >> correspond > >> to the maximum and minimum serialized bytes, then perform a `range()` > >> query > >> instead of an `all()`. If your keys don't have a well-defined ordering > >> such > >> that > >> you can't determine the MAX_KEY, then you probably don't care about the > >> iterator order anyway. > >> > >> But with variable-length keys, there is no MAX_KEY. If all your keys were > >> just > >> of the form 'a', 'aa', 'aaaaa', 'aaaaaaa' then in fact the only way to > >> figure out the > >> maximum key in the store is by using `all()` -- and without a reverse > >> iterator, you're > >> doomed to iterate through every single key just to answer that simple > >> question. > >> > >> That said, I still think determining the iterator order based on the > >> to/from bytes > >> makes a lot of intuitive sense and gives the API a nice symmetry. What if > >> we > >> solved the `all()` problem by just giving `all()` a reverse form to > >> complement it? > >> Ie we would have `all()` and `reverseAll()`, or something to that effect. > >> > >> On Thu, May 21, 2020 at 3:41 PM Jorge Esteban Quilcate Otoya < > >> quilcate.jo...@gmail.com> wrote: > >> > >> > Thanks John. > >> > > >> > Agree. I like the first approach as well, with StreamsConfig flag > >> passing > >> > by via ProcessorContext. > >> > > >> > Another positive effect with "reverse parameters" is that in the case of > >> > `fetch(keyFrom, keyTo, timeFrom, timeTo)` users can decide _which_ pair > >> to > >> > flip, whether with `ReadDirection` enum it apply to both. > >> > > >> > The only issue I've found while reviewing the KIP is that `all()` won't > >> fit > >> > within this approach. > >> > > >> > We could remove it from the KIP and argue that for WindowStore, > >> > `fetchAll(0, Long.MAX_VALUE)` can be used to get all in reverse order, > >> and > >> > for KeyValueStore, no ordering guarantees are provided. > >> > > >> > If there is consensus with this changes, I will go and update the KIP. > >> > > >> > On Thu, May 21, 2020 at 3:33 PM John Roesler <vvcep...@apache.org> > >> wrote: > >> > > >> > > Hi Jorge, > >> > > > >> > > Thanks for that idea. I agree, a feature flag would protect anyone > >> > > who may be depending on the current behavior. > >> > > > >> > > It seems better to locate the feature flag in the initialization > >> logic of > >> > > the store, rather than have a method on the "live" store that changes > >> > > its behavior on the fly. > >> > > > >> > > It seems like there are two options here, one is to add a new config: > >> > > > >> > > StreamsConfig.ENABLE_BACKWARDS_ITERATION = > >> > > "enable.backwards.iteration > >> > > > >> > > Or we can add a feature flag in Materialized, like > >> > > > >> > > Materialized.enableBackwardsIteration() > >> > > > >> > > I think I'd personally lean toward the config, for the following > >> reason. > >> > > The concern that Sophie raised is that someone's program may depend > >> > > on the existing contract of getting an empty iterator. We don't want > >> to > >> > > switch behavior when they aren't expecting it, so we provide them a > >> > > config to assert that they _are_ expecting the new behavior, which > >> > > means they take responsibility for updating their code to expect the > >> new > >> > > behavior. > >> > > > >> > > There doesn't seem to be a reason to offer a choice of behaviors on a > >> > > per-query, or per-store basis. We just want people to be not surprised > >> > > by this change in general. > >> > > > >> > > What do you think? > >> > > Thanks, > >> > > -John > >> > > > >> > > On Wed, May 20, 2020, at 17:37, Jorge Quilcate wrote: > >> > > > Thank you both for the great feedback. > >> > > > > >> > > > I like the "fancy" proposal :), and how it removes the need for > >> > > > additional API methods. And with a feature flag on `StateStore`, > >> > > > disabled by default, should no break current users. > >> > > > > >> > > > The only side-effect I can think of is that: by moving the flag > >> > upwards, > >> > > > all later operations become affected; which might be ok for most > >> (all?) > >> > > > cases. I can't think of an scenario where this would be an issue, > >> just > >> > > > want to point this out. > >> > > > > >> > > > If moving to this approach, I'd like to check if I got this right > >> > before > >> > > > updating the KIP: > >> > > > > >> > > > - only `StateStore` will change by having a new method: > >> > > > `backwardIteration()`, `false` by default to keep things compatible. > >> > > > - then all `*Stores` will have to update their implementation based > >> on > >> > > > this flag. > >> > > > > >> > > > > >> > > > On 20/05/2020 21:02, Sophie Blee-Goldman wrote: > >> > > > >> There's no possibility that someone could be relying > >> > > > >> on iterating over that range in increasing order, because that's > >> not > >> > > what > >> > > > >> happens. However, they could indeed be relying on getting an > >> empty > >> > > > > iterator > >> > > > > > >> > > > > I just meant that they might be relying on the assumption that the > >> > > range > >> > > > > query > >> > > > > will never return results with decreasing keys. The empty iterator > >> > > wouldn't > >> > > > > break that contract, but of course a surprise reverse iterator > >> would. > >> > > > > > >> > > > > FWIW I actually am in favor of automatically converting to a > >> reverse > >> > > > > iterator, > >> > > > > I just thought we should consider whether this should be off by > >> > > default or > >> > > > > even possible to disable at all. > >> > > > > > >> > > > > On Tue, May 19, 2020 at 7:42 PM John Roesler <vvcep...@apache.org > >> > > >> > > wrote: > >> > > > > > >> > > > >> Thanks for the response, Sophie, > >> > > > >> > >> > > > >> I wholeheartedly agree we should take as much into account as > >> > possible > >> > > > >> up front, rather than regretting our decisions later. I actually > >> do > >> > > share > >> > > > >> your vague sense of worry, which was what led me to say initially > >> > > that I > >> > > > >> thought my counterproposal might be "too fancy". Sometimes, it's > >> > > better > >> > > > >> to be explicit instead of "elegant", if we think more people > >> will be > >> > > > >> confused > >> > > > >> than not. > >> > > > >> > >> > > > >> I really don't think that there's any danger of "relying on a > >> bug" > >> > > here, > >> > > > >> although > >> > > > >> people certainly could be relying on current behavior. One thing > >> to > >> > be > >> > > > >> clear > >> > > > >> about (which I just left a more detailed comment in KAFKA-8159 > >> > about) > >> > > is > >> > > > >> that > >> > > > >> when we say something like key1 > key2, this ordering is defined > >> by > >> > > the > >> > > > >> serde's output and nothing else. > >> > > > >> > >> > > > >> Currently, thanks to your fix in > >> > > https://github.com/apache/kafka/pull/6521 > >> > > > >> , > >> > > > >> the store contract is that for range scans, if from > to, then > >> the > >> > > store > >> > > > >> must > >> > > > >> return an empty iterator. There's no possibility that someone > >> could > >> > be > >> > > > >> relying > >> > > > >> on iterating over that range in increasing order, because that's > >> not > >> > > what > >> > > > >> happens. However, they could indeed be relying on getting an > >> empty > >> > > > >> iterator. > >> > > > >> > >> > > > >> My counterproposal was to actually change this contract to say > >> that > >> > > the > >> > > > >> store > >> > > > >> must return an iterator over the keys in that range, but in the > >> > > reverse > >> > > > >> order. > >> > > > >> So, in addition to considering whether this idea is "too fancy" > >> (aka > >> > > > >> confusing), > >> > > > >> we should also consider the likelihood of breaking an existing > >> > > program with > >> > > > >> this behavior/contract change. > >> > > > >> > >> > > > >> To echo your clarification, I'm also not advocating strongly in > >> > favor > >> > > of my > >> > > > >> proposal. I just wanted to present it for consideration alongside > >> > > Jorge's > >> > > > >> original one. > >> > > > >> > >> > > > >> Thanks for raising these very good points, > >> > > > >> -John > >> > > > >> > >> > > > >> On Tue, May 19, 2020, at 20:49, Sophie Blee-Goldman wrote: > >> > > > >>>> Rather than working around it, I think we should just fix it > >> > > > >>> Now *that's* a "fancy" idea :P > >> > > > >>> > >> > > > >>> That was my primary concern, although I do have a vague sense of > >> > > worry > >> > > > >>> that we might be allowing users to get into trouble without > >> > > realizing it. > >> > > > >>> For example if their custom serdes suffer a similar bug as the > >> > above, > >> > > > >>> and/or > >> > > > >>> they rely on getting results in increasing order (of the keys) > >> even > >> > > when > >> > > > >>> to < from. Maybe they're relying on the fact that the range > >> query > >> > > returns > >> > > > >>> nothing in that case. > >> > > > >>> > >> > > > >>> Not sure if that qualifies as relying on a bug or not, but in > >> that > >> > > past > >> > > > >>> we've > >> > > > >>> taken the stance that we should not break compatibility even if > >> the > >> > > user > >> > > > >>> was relying on bugs or unintentional behavior. > >> > > > >>> > >> > > > >>> Just to clarify I'm not advocating strongly against this > >> proposal, > >> > > just > >> > > > >>> laying > >> > > > >>> out some considerations we should take into account. At the end > >> of > >> > > the > >> > > > >> day > >> > > > >>> we should do what's right rather than maintain compatibility > >> with > >> > > > >> existing > >> > > > >>> bugs, but sometimes there's a reasonable middle ground. > >> > > > >>> > >> > > > >>> On Tue, May 19, 2020 at 6:15 PM John Roesler < > >> vvcep...@apache.org> > >> > > > >> wrote: > >> > > > >>>> Thanks Sophie, > >> > > > >>>> > >> > > > >>>> Woah, that’s a nasty bug. Rather than working around it, I > >> think > >> > we > >> > > > >> should > >> > > > >>>> just fix it. I’ll leave some comments on the Jira. > >> > > > >>>> > >> > > > >>>> It doesn’t seem like it should be this KIP’s concern that some > >> > > serdes > >> > > > >>>> might be incorrectly written. > >> > > > >>>> > >> > > > >>>> Were there other practical concerns that you had in mind? > >> > > > >>>> > >> > > > >>>> Thanks, > >> > > > >>>> John > >> > > > >>>> > >> > > > >>>> On Tue, May 19, 2020, at 19:10, Sophie Blee-Goldman wrote: > >> > > > >>>>> I like this "fancy idea" to just flip the to/from bytes but I > >> > think > >> > > > >> there > >> > > > >>>>> are some practical limitations to implementing this. In > >> > particular > >> > > > >>>>> I'm thinking about this issue > >> > > > >>>>> <https://issues.apache.org/jira/browse/KAFKA-8159> with the > >> > > built-in > >> > > > >>>> signed > >> > > > >>>>> number serdes. > >> > > > >>>>> > >> > > > >>>>> This trick would actually fix the problem for > >> negative-negative > >> > > > >> queries > >> > > > >>>>> (ie where to & from are negative) but would cause undetectable > >> > > > >>>>> incorrect results for negative-positive queries. For example, > >> say > >> > > you > >> > > > >>>>> call #range with from = -1 and to = 1, using the Short serdes. > >> > The > >> > > > >>>>> serialized bytes for that are > >> > > > >>>>> > >> > > > >>>>> from = 1111111111111111 > >> > > > >>>>> to = 0000000000000001 > >> > > > >>>>> > >> > > > >>>>> so we would end up flipping those and iterating over all keys > >> > from > >> > > > >>>>> 0000000000000001 to 1111111111111111. Iterating in > >> > lexicographical > >> > > > >>>>> order means we would iterate over every key in the space > >> *except* > >> > > for > >> > > > >>>>> 0, but 0 is actually the *only* other key we meant to be > >> included > >> > > in > >> > > > >> the > >> > > > >>>>> range query. > >> > > > >>>>> > >> > > > >>>>> Currently we just log a warning and return an empty iterator > >> when > >> > > > >>>>> to < from, which is obviously also incorrect but feels > >> slightly > >> > > more > >> > > > >>>>> palatable. If we start automatically converting to reverse > >> > queries > >> > > we > >> > > > >>>>> can't even log a warning in this case unless we wanted to log > >> a > >> > > > >> warning > >> > > > >>>>> every time, which would be weird to do for a valid usage of a > >> new > >> > > > >>>>> feature. > >> > > > >>>>> > >> > > > >>>>> All that said, I still like the idea overall. Off the top of > >> my > >> > > head > >> > > > >> I > >> > > > >>>> guess > >> > > > >>>>> we could add a store config to enable/disable automatic > >> reverse > >> > > > >>>> iteration, > >> > > > >>>>> which is off by default? > >> > > > >>>>> > >> > > > >>>>> Thanks for the KIP! This will be a nice addition > >> > > > >>>>> > >> > > > >>>>> Sophie > >> > > > >>>>> > >> > > > >>>>> > >> > > > >>>>> On Tue, May 19, 2020 at 3:21 PM John Roesler < > >> > vvcep...@apache.org> > >> > > > >>>> wrote: > >> > > > >>>>>> Hi there Jorge, > >> > > > >>>>>> > >> > > > >>>>>> Thanks for the KIP! > >> > > > >>>>>> > >> > > > >>>>>> I think this feature sounds very reasonable. > >> > > > >>>>>> > >> > > > >>>>>> I'm not 100% sure if this is "too fancy", but what do you > >> think > >> > > > >>>>>> about avoiding the enum by instead allowing people to flip > >> > > > >>>>>> the "from" and "to" endpoints? I.e., reading from "A" to "Z" > >> > would > >> > > > >>>>>> be a forward scan, and from "Z" to "A" would be a backward > >> one? > >> > > > >>>>>> > >> > > > >>>>>> Thanks, > >> > > > >>>>>> -John > >> > > > >>>>>> > >> > > > >>>>>> On Tue, May 19, 2020, at 16:20, Jorge Quilcate wrote: > >> > > > >>>>>>> Hi everyone, > >> > > > >>>>>>> > >> > > > >>>>>>> I would like to start the discussion for KIP-617: > >> > > > >>>>>>> > >> > > > >> > >> > > > >> > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-617%3A+Allow+Kafka+Streams+State+Stores+to+be+iterated+backwards > >> > > > >>>>>>> Looking forward to your feedback. > >> > > > >>>>>>> > >> > > > >>>>>>> Thanks! > >> > > > >>>>>>> Jorge. > >> > > > >>>>>>> > >> > > > >>>>>>> > >> > > > >>>>>>> Attachments: > >> > > > >>>>>>> * 0x5F2C6E22064982DF.asc > >> > > > > >> > > > > >> > > > Attachments: > >> > > > * 0x5F2C6E22064982DF.asc > >> > > > >> > > >> > > >