Hey,

Sorry that I am late to the game. I am not 100% convinced about the
current proposal. Using a new config as feature flag seems to be rather
"nasty" to me, and flipping from/to is a little bit too fancy for my
personal taste.

I agree, that the original proposal using a "ReadDirection" enum is not
ideal either.

Thus, I would like to put out a new idea: We could add a new interface
that offers new methods that return revers iterators.

The KIP already proposes to add `reverseAll()` and it seems backward
incompatible to just add this method to `ReadOnlyKeyValueStore` and
`ReadOnlyWindowStore`. I don't think we could provide a useful default
implementation for custom stores and thus either break compatibility or
need add a default that just throws an exception. Neither seems to be a
good option.

Using a new interface avoid this issue and allows users implementing
custom stores to opt-in by adding the interface to their stores.
Furthermore, we don't need any config. In the end, we encapsulte the
change into the store, and our runtime is agnostic to it (as it should be).

The hierarchy becomes a little complex (but uses would not really see
the complexity):

// exsiting
ReadOnlyKeyValueStore
KeyValueStore extend StateStore, ReadOnlyKeyValueStore


// helper interface; users don't care
// need similar ones for other stores
ReverseReadOnlyKeyValueStore {
    KeyValueIterator<K, V> reverseRange(K from, K to);
    KeyValueIterator<K, V> reverseAll();
}


// two new user facing interfaces for kv-store
// need similar ones for other stores
ReadOnlyKeyValueStoreWithReverseIterators extends ReadOnlyKeyValueStore,
ReverseReadOnlyKeyValueStore

KeyValueStoreWithReverseIterators extends KeyValueStore,
ReverseReadOnlyKeyValueStore


// updated (also internal)
// also need to update other built-in stores
RocksDB implements KeyValueStoreWithReverseIterators, BulkLoadingStore


In the end, user would only care about the two (for kv-case) new
interface that offer revers iterator (read only and regular) and can
cast stores accordingly in their Processors/Transformers or via IQ.


Btw: if we add revers iterator for KeyValue and Window store, should we
do the same for Session store?



This might be more code to write, but I believe it provides the better
user experience. Thoughts?



-Matthias




On 5/26/20 8:47 PM, John Roesler wrote:
> Sorry for my silence, Jorge,
> 
> I've just taken another look, and I'm personally happy with the KIP.
> 
> Thanks,
> -John
> 
> On Tue, May 26, 2020, at 16:17, Jorge Esteban Quilcate Otoya wrote:
>> If no additional comments, I will proceed to start the a vote thread.
>>
>> Thanks a lot for your feedback!
>>
>> On Fri, May 22, 2020 at 9:25 AM Jorge Esteban Quilcate Otoya <
>> quilcate.jo...@gmail.com> wrote:
>>
>>> Thanks Sophie. I like the `reverseAll()` idea.
>>>
>>> I updated the KIP with your feedback.
>>>
>>>
>>>
>>> On Fri, May 22, 2020 at 4:22 AM Sophie Blee-Goldman <sop...@confluent.io>
>>> wrote:
>>>
>>>> Hm, the case of `all()` does seem to present a dilemma in the case of
>>>> variable-length keys.
>>>>
>>>> In the case of fixed-length keys, you can just compute the keys that
>>>> correspond
>>>> to the maximum and minimum serialized bytes, then perform a `range()`
>>>> query
>>>> instead of an `all()`. If your keys don't have a well-defined ordering
>>>> such
>>>> that
>>>> you can't determine the MAX_KEY, then you probably don't care about the
>>>> iterator order anyway.
>>>>
>>>>  But with variable-length keys, there is no MAX_KEY. If all your keys were
>>>> just
>>>> of the form 'a', 'aa', 'aaaaa', 'aaaaaaa' then in fact the only way to
>>>> figure out the
>>>> maximum key in the store is by using `all()` -- and without a reverse
>>>> iterator, you're
>>>> doomed to iterate through every single key just to answer that simple
>>>> question.
>>>>
>>>> That said, I still think determining the iterator order based on the
>>>> to/from bytes
>>>> makes a lot of intuitive sense and gives the API a nice symmetry. What if
>>>> we
>>>> solved the `all()` problem by just giving `all()` a reverse form to
>>>> complement it?
>>>> Ie we would have `all()` and `reverseAll()`, or something to that effect.
>>>>
>>>> On Thu, May 21, 2020 at 3:41 PM Jorge Esteban Quilcate Otoya <
>>>> quilcate.jo...@gmail.com> wrote:
>>>>
>>>>> Thanks John.
>>>>>
>>>>> Agree. I like the first approach as well, with StreamsConfig flag
>>>> passing
>>>>> by via ProcessorContext.
>>>>>
>>>>> Another positive effect with "reverse parameters" is that in the case of
>>>>> `fetch(keyFrom, keyTo, timeFrom, timeTo)` users can decide _which_ pair
>>>> to
>>>>> flip, whether with `ReadDirection` enum it apply to both.
>>>>>
>>>>> The only issue I've found while reviewing the KIP is that `all()` won't
>>>> fit
>>>>> within this approach.
>>>>>
>>>>> We could remove it from the KIP and argue that for WindowStore,
>>>>> `fetchAll(0, Long.MAX_VALUE)` can be used to get all in reverse order,
>>>> and
>>>>> for KeyValueStore, no ordering guarantees are provided.
>>>>>
>>>>> If there is consensus with this changes, I will go and update the KIP.
>>>>>
>>>>> On Thu, May 21, 2020 at 3:33 PM John Roesler <vvcep...@apache.org>
>>>> wrote:
>>>>>
>>>>>> Hi Jorge,
>>>>>>
>>>>>> Thanks for that idea. I agree, a feature flag would protect anyone
>>>>>> who may be depending on the current behavior.
>>>>>>
>>>>>> It seems better to locate the feature flag in the initialization
>>>> logic of
>>>>>> the store, rather than have a method on the "live" store that changes
>>>>>> its behavior on the fly.
>>>>>>
>>>>>> It seems like there are two options here, one is to add a new config:
>>>>>>
>>>>>> StreamsConfig.ENABLE_BACKWARDS_ITERATION =
>>>>>>   "enable.backwards.iteration
>>>>>>
>>>>>> Or we can add a feature flag in Materialized, like
>>>>>>
>>>>>> Materialized.enableBackwardsIteration()
>>>>>>
>>>>>> I think I'd personally lean toward the config, for the following
>>>> reason.
>>>>>> The concern that Sophie raised is that someone's program may depend
>>>>>> on the existing contract of getting an empty iterator. We don't want
>>>> to
>>>>>> switch behavior when they aren't expecting it, so we provide them a
>>>>>> config to assert that they _are_ expecting the new behavior, which
>>>>>> means they take responsibility for updating their code to expect the
>>>> new
>>>>>> behavior.
>>>>>>
>>>>>> There doesn't seem to be a reason to offer a choice of behaviors on a
>>>>>> per-query, or per-store basis. We just want people to be not surprised
>>>>>> by this change in general.
>>>>>>
>>>>>> What do you think?
>>>>>> Thanks,
>>>>>> -John
>>>>>>
>>>>>> On Wed, May 20, 2020, at 17:37, Jorge Quilcate wrote:
>>>>>>> Thank you both for the great feedback.
>>>>>>>
>>>>>>> I like the "fancy" proposal :), and how it removes the need for
>>>>>>> additional API methods. And with a feature flag on `StateStore`,
>>>>>>> disabled by default, should no break current users.
>>>>>>>
>>>>>>> The only side-effect I can think of is that: by moving the flag
>>>>> upwards,
>>>>>>> all later operations become affected; which might be ok for most
>>>> (all?)
>>>>>>> cases. I can't think of an scenario where this would be an issue,
>>>> just
>>>>>>> want to point this out.
>>>>>>>
>>>>>>> If moving to this approach, I'd like to check if I got this right
>>>>> before
>>>>>>> updating the KIP:
>>>>>>>
>>>>>>> - only `StateStore` will change by having a new method:
>>>>>>> `backwardIteration()`, `false` by default to keep things compatible.
>>>>>>> - then all `*Stores` will have to update their implementation based
>>>> on
>>>>>>> this flag.
>>>>>>>
>>>>>>>
>>>>>>> On 20/05/2020 21:02, Sophie Blee-Goldman wrote:
>>>>>>>>> There's no possibility that someone could be relying
>>>>>>>>> on iterating over that range in increasing order, because that's
>>>> not
>>>>>> what
>>>>>>>>> happens. However, they could indeed be relying on getting an
>>>> empty
>>>>>>>> iterator
>>>>>>>>
>>>>>>>> I just meant that they might be relying on the assumption that the
>>>>>> range
>>>>>>>> query
>>>>>>>> will never return results with decreasing keys. The empty iterator
>>>>>> wouldn't
>>>>>>>> break that contract, but of course a surprise reverse iterator
>>>> would.
>>>>>>>>
>>>>>>>> FWIW I actually am in favor of automatically converting to a
>>>> reverse
>>>>>>>> iterator,
>>>>>>>> I just thought we should consider whether this should be off by
>>>>>> default or
>>>>>>>> even possible to disable at all.
>>>>>>>>
>>>>>>>> On Tue, May 19, 2020 at 7:42 PM John Roesler <vvcep...@apache.org
>>>>>
>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks for the response, Sophie,
>>>>>>>>>
>>>>>>>>> I wholeheartedly agree we should take as much into account as
>>>>> possible
>>>>>>>>> up front, rather than regretting our decisions later. I actually
>>>> do
>>>>>> share
>>>>>>>>> your vague sense of worry, which was what led me to say initially
>>>>>> that I
>>>>>>>>> thought my counterproposal might be "too fancy". Sometimes, it's
>>>>>> better
>>>>>>>>> to be explicit instead of "elegant", if we think more people
>>>> will be
>>>>>>>>> confused
>>>>>>>>> than not.
>>>>>>>>>
>>>>>>>>> I really don't think that there's any danger of "relying on a
>>>> bug"
>>>>>> here,
>>>>>>>>> although
>>>>>>>>> people certainly could be relying on current behavior. One thing
>>>> to
>>>>> be
>>>>>>>>> clear
>>>>>>>>> about (which I just left a more detailed comment in KAFKA-8159
>>>>> about)
>>>>>> is
>>>>>>>>> that
>>>>>>>>> when we say something like key1 > key2, this ordering is defined
>>>> by
>>>>>> the
>>>>>>>>> serde's output and nothing else.
>>>>>>>>>
>>>>>>>>> Currently, thanks to your fix in
>>>>>> https://github.com/apache/kafka/pull/6521
>>>>>>>>> ,
>>>>>>>>> the store contract is that for range scans, if from > to, then
>>>> the
>>>>>> store
>>>>>>>>> must
>>>>>>>>> return an empty iterator. There's no possibility that someone
>>>> could
>>>>> be
>>>>>>>>> relying
>>>>>>>>> on iterating over that range in increasing order, because that's
>>>> not
>>>>>> what
>>>>>>>>> happens. However, they could indeed be relying on getting an
>>>> empty
>>>>>>>>> iterator.
>>>>>>>>>
>>>>>>>>> My counterproposal was to actually change this contract to say
>>>> that
>>>>>> the
>>>>>>>>> store
>>>>>>>>> must return an iterator over the keys in that range, but in the
>>>>>> reverse
>>>>>>>>> order.
>>>>>>>>> So, in addition to considering whether this idea is "too fancy"
>>>> (aka
>>>>>>>>> confusing),
>>>>>>>>> we should also consider the likelihood of breaking an existing
>>>>>> program with
>>>>>>>>> this behavior/contract change.
>>>>>>>>>
>>>>>>>>> To echo your clarification, I'm also not advocating strongly in
>>>>> favor
>>>>>> of my
>>>>>>>>> proposal. I just wanted to present it for consideration alongside
>>>>>> Jorge's
>>>>>>>>> original one.
>>>>>>>>>
>>>>>>>>> Thanks for raising these very good points,
>>>>>>>>> -John
>>>>>>>>>
>>>>>>>>> On Tue, May 19, 2020, at 20:49, Sophie Blee-Goldman wrote:
>>>>>>>>>>> Rather than working around it, I think we should just fix it
>>>>>>>>>> Now *that's* a "fancy" idea :P
>>>>>>>>>>
>>>>>>>>>> That was my primary concern, although I do have a vague sense of
>>>>>> worry
>>>>>>>>>> that we might be allowing users to get into trouble without
>>>>>> realizing it.
>>>>>>>>>> For example if their custom serdes suffer a similar bug as the
>>>>> above,
>>>>>>>>>> and/or
>>>>>>>>>> they rely on getting results in increasing order (of the keys)
>>>> even
>>>>>> when
>>>>>>>>>> to < from. Maybe they're relying on the fact that the range
>>>> query
>>>>>> returns
>>>>>>>>>> nothing in that case.
>>>>>>>>>>
>>>>>>>>>> Not sure if that qualifies as relying on a bug or not, but in
>>>> that
>>>>>> past
>>>>>>>>>> we've
>>>>>>>>>> taken the stance that we should not break compatibility even if
>>>> the
>>>>>> user
>>>>>>>>>> was relying on bugs or unintentional behavior.
>>>>>>>>>>
>>>>>>>>>> Just to clarify I'm not advocating strongly against this
>>>> proposal,
>>>>>> just
>>>>>>>>>> laying
>>>>>>>>>> out some considerations we should take into account. At the end
>>>> of
>>>>>> the
>>>>>>>>> day
>>>>>>>>>> we should do what's right rather than maintain compatibility
>>>> with
>>>>>>>>> existing
>>>>>>>>>> bugs, but sometimes there's a reasonable middle ground.
>>>>>>>>>>
>>>>>>>>>> On Tue, May 19, 2020 at 6:15 PM John Roesler <
>>>> vvcep...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>>> Thanks Sophie,
>>>>>>>>>>>
>>>>>>>>>>> Woah, that’s a nasty bug. Rather than working around it, I
>>>> think
>>>>> we
>>>>>>>>> should
>>>>>>>>>>> just fix it. I’ll leave some comments on the Jira.
>>>>>>>>>>>
>>>>>>>>>>> It doesn’t seem like it should be this KIP’s concern that some
>>>>>> serdes
>>>>>>>>>>> might be incorrectly written.
>>>>>>>>>>>
>>>>>>>>>>> Were there other practical concerns that you had in mind?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> John
>>>>>>>>>>>
>>>>>>>>>>> On Tue, May 19, 2020, at 19:10, Sophie Blee-Goldman wrote:
>>>>>>>>>>>> I like this "fancy idea" to just flip the to/from bytes but I
>>>>> think
>>>>>>>>> there
>>>>>>>>>>>> are some practical limitations to implementing this. In
>>>>> particular
>>>>>>>>>>>> I'm thinking about this issue
>>>>>>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-8159> with the
>>>>>> built-in
>>>>>>>>>>> signed
>>>>>>>>>>>> number serdes.
>>>>>>>>>>>>
>>>>>>>>>>>> This trick would actually fix the problem for
>>>> negative-negative
>>>>>>>>> queries
>>>>>>>>>>>> (ie where to & from are negative) but would cause undetectable
>>>>>>>>>>>> incorrect results for negative-positive queries. For example,
>>>> say
>>>>>> you
>>>>>>>>>>>> call #range with from = -1 and to = 1, using the Short serdes.
>>>>> The
>>>>>>>>>>>> serialized bytes for that are
>>>>>>>>>>>>
>>>>>>>>>>>> from = 1111111111111111
>>>>>>>>>>>> to = 0000000000000001
>>>>>>>>>>>>
>>>>>>>>>>>> so we would end up flipping those and iterating over all keys
>>>>> from
>>>>>>>>>>>> 0000000000000001 to 1111111111111111. Iterating in
>>>>> lexicographical
>>>>>>>>>>>> order means we would iterate over every key in the space
>>>> *except*
>>>>>> for
>>>>>>>>>>>> 0, but 0 is actually the *only* other key we meant to be
>>>> included
>>>>>> in
>>>>>>>>> the
>>>>>>>>>>>> range query.
>>>>>>>>>>>>
>>>>>>>>>>>> Currently we just log a warning and return an empty iterator
>>>> when
>>>>>>>>>>>> to < from, which is obviously also incorrect but feels
>>>> slightly
>>>>>> more
>>>>>>>>>>>> palatable. If we start automatically converting to reverse
>>>>> queries
>>>>>> we
>>>>>>>>>>>> can't even log a warning in this case unless we wanted to log
>>>> a
>>>>>>>>> warning
>>>>>>>>>>>> every time, which would be weird to do for a valid usage of a
>>>> new
>>>>>>>>>>>> feature.
>>>>>>>>>>>>
>>>>>>>>>>>> All that said, I still like the idea overall. Off the top of
>>>> my
>>>>>> head
>>>>>>>>> I
>>>>>>>>>>> guess
>>>>>>>>>>>> we could add a store config to enable/disable automatic
>>>> reverse
>>>>>>>>>>> iteration,
>>>>>>>>>>>> which is off by default?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the KIP! This will be a nice addition
>>>>>>>>>>>>
>>>>>>>>>>>> Sophie
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, May 19, 2020 at 3:21 PM John Roesler <
>>>>> vvcep...@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Hi there Jorge,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for the KIP!
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think this feature sounds very reasonable.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm not 100% sure if this is "too fancy", but what do you
>>>> think
>>>>>>>>>>>>> about avoiding the enum by instead allowing people to flip
>>>>>>>>>>>>> the "from" and "to" endpoints? I.e., reading from "A" to "Z"
>>>>> would
>>>>>>>>>>>>> be a forward scan, and from "Z" to "A" would be a backward
>>>> one?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> -John
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, May 19, 2020, at 16:20, Jorge Quilcate wrote:
>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I would like to start the discussion for KIP-617:
>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>>
>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-617%3A+Allow+Kafka+Streams+State+Stores+to+be+iterated+backwards
>>>>>>>>>>>>>> Looking forward to your feedback.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>> Jorge.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Attachments:
>>>>>>>>>>>>>> * 0x5F2C6E22064982DF.asc
>>>>>>>
>>>>>>>
>>>>>>> Attachments:
>>>>>>> * 0x5F2C6E22064982DF.asc
>>>>>>
>>>>>
>>>>
>>>
>>

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to