Thank you all for the very exact and constructive comments. I really enjoyed reading your ideas and all the important points you made me aware of. I updated KIP-968 as follows:
1. If the key or time bounds are null, the method returns NPE. 2. The "valid" word: I removed the sentence "all the records that are valid..." and replaced it with an exact explanation. More over, I explained it with an example in the KIP but not in the javadocs. Do I need to add the example to the javadocs as well? 3. Since I followed Bruno's suggestion and removed the allVersions() method, the problem of meaningless combinations is solved, and I do not need any IllegalArgumentException or something like that. Therefore, the change is that if no time bound is specified, the query returns the records with the specified key for all timestamps (all versions). 4. As Victoria suggested, adding a method to the *VersionedKeyValueStore *interface is essential. So I did that. I had this method only in the RocksDBVersionedStore class, which was not enough. 5. I added the *validTo* field to the VersionedRecord class to be able to represent the tombstones. As you suggested, we postpone solving the problem of retrieving consecutive tombstones for later. 6. I added the "Test Plan" section to all KIPs. I hope what I wrote is convincing. 7. I added the *withAscendingTimestamp()* method to provide more code readability for the user. 8. I removed the evil word "get" from all getter methods. There have also been some more suggestions which I am still not convinced or clear about them: 1. Regarding asOf vs until: reading all comments, my conclusion was that I keep it as "asOf" (following Walker's idea as the native speaker as well as Bruno's suggestion to be consistent with single-key_single_ts queries). But I do not have a personal preference. If you insist on "until", I change it. 2. Bruno suggested renaming the class "MultiVersionedKeyQuery" to sth else. We already had a long discussion about the name with Matthias. I am open to renaming it to something else, but do you have any ideas? 3. Matthias suggested having a method with two input parameters that enables the user to specify both time bounds in the same method. Isn't it introducing redundancy? It is somehow disrespectful to the idea of having composable methods. 4. Bruno suggested renaming the methods "asOf" and "from" to "asOfTime" and "fromTime". If I do that, then it is not consistent with KIP-960. Moreover, the input parameter is clearly a timestamp, which explains enough. What do you think about that? 5. I was asked to add more examples to the example section. My question is, what is the main purpose of that? If I know it clearly, then I can add what you mean. Cheers, Alieh On Tue, Oct 10, 2023 at 1:13 AM Matthias J. Sax <mj...@apache.org> wrote: > Bruno and I had some background conversation about the `get` prefix > question including a few other committers. > > The official policy was never changed, and we should not add the > `get`-prefix. It's a slip on our side in previous KIPs to add the > `get`-prefix and we should actually clean it up doing a follow up KIP. > > > -Matthias > > > On 10/5/23 5:26 AM, Bruno Cadonna wrote: > > Hi Matthias, > > > > Given all the IQv2 KIPs that use getX and given recent PRs (internal > > interfaces mainly) that got merged, I was under the impression that we > > moved away from the strict no-getX policy. > > > > I do not think it was an accident using getX in the IQv2 KIPs since > > somebody would have brought it up, otherwise. > > > > I am fine with both types of getters. > > > > If we think, we need to discuss this in a broader context, let's start a > > separate thread. > > > > > > Best, > > Bruno > > > > > > > > > > > > On 10/5/23 7:44 AM, Matthias J. Sax wrote: > >> I agree to (almost) everything what Bruno said. > >> > >> > >>> In general, we tend to move away from using getters without "get", > >>> recently. So I would keep the "get". > >> > >> This is new to me? Can you elaborate on this point? Why do you think > >> that's the case? > >> > >> I actually did realize (after Walker mentioned it) that existing query > >> types use `get` prefix, but to me it seems that it was by accident and > >> we should consider correcting it? Thus, I would actually prefer to not > >> add the `get` prefix for new methods query types. > >> > >> IMHO, we should do a follow up KIP to deprecate all methods with `get` > >> prefix and replace them with new ones without `get` -- it's of course > >> always kinda "unnecessary" noise, but if we don't do it, we might get > >> into more and more inconsistent naming what would result in a "bad" API. > >> > >> If we indeed want to change the convention and use the `get` prefix, I > >> would strongly advocate to bit the bullet and do KIP to pro-actively > >> add the `get` "everywhere" it's missing... But overall, it seems to be > >> a much broader decision, and we should get buy in from many committers > >> about it -- as long as there is no broad consensus to add `get` > >> everywhere, I would strongly prefer not to diverge from the current > >> agreement to omit `get`. > >> > >> > >> > >> -Matthias > >> > >> > >> > >> > >> On 10/4/23 2:36 AM, Bruno Cadonna wrote: > >>> Hi, > >>> > >>> Regarding tombstones: > >>> As far as I understand, we need to add either a validTo field to > >>> VersionedRecord or we need to return tombstones, otherwise the result > >>> is not complete, because users could never know a record was deleted > >>> at some point before the second non-null value was put. > >>> I like more adding the validTo field since it makes the result more > >>> concise and easier interpretable. > >>> > >>> Extending on Victoria's example, with the following puts > >>> > >>> put(k, v1, time=0) > >>> put(k, null, time=5) > >>> put(k, null, time=10) > >>> put(k, null, time=15) > >>> put(k, v2, time=20) > >>> > >>> the result with tombstones would be > >>> > >>> value, timestamp > >>> (v1, 0) > >>> (null, 5) > >>> (null, 10) > >>> (null, 15) > >>> (v2, 20) > >>> > >>> instead of > >>> > >>> value, timestamp, validTo > >>> (v1, 0, 5) > >>> (v2, 20, null) > >>> > >>> The benefit of conciseness would already apply to one single tombstone. > >>> > >>> On the other hand, why would somebody write consecutive tombstones > >>> into a versioned state store? I guess if somebody does that on > >>> purpose, then there should be a way to retrieve each of those > >>> tombstones, right? > >>> So maybe we need both -- validTo field and the option to return > >>> tombstones. The latter might be moved to a future KIP in case we see > >>> the need. > >>> > >>> > >>> Regarding .within(fromTs, toTs): > >>> I would keep it simple with .from() and .asOfTimestamp() (or > >>> .until()). If we go with .within(), I would opt for > >>> .withinTimeRange(fromTs, toTs), because the query becomes more > readable: > >>> > >>> MultiVersionedKeyQuery > >>> .withKey(1) > >>> .withinTimeRange(Instant.parse(2023-08-03T10:37:30.00Z), > >>> Instant.parse(2023-08-04T10:37:30.00Z)) > >>> > >>> If we stay with .from() and .until(), we should consider .fromTime() > >>> and .untilTime() (or .toTime()): > >>> > >>> MultiVersionedKeyQuery > >>> .withKey(1) > >>> .fromTime(Instant.parse(2023-08-03T10:37:30.00Z)) > >>> .untilTime(Instant.parse(2023-08-04T10:37:30.00Z)) > >>> > >>> > >>> > >>> Regarding asOf vs. until: > >>> I think asOf() is more used in point in time queries as Walker > >>> mentioned where this KIP specifies a time range. IMO asOf() fits very > >>> well with KIP-960 where one version is queried, but here I think > >>> .until() fits better. That might just be a matter of taste and in the > >>> end I am fine with both as long as it is well documented. > >>> > >>> > >>> Regarding getters without "get": > >>> In the other IQv2 classes we used getters with "get". In general, we > >>> tend to move away from using getters without "get", recently. So I > >>> would keep the "get". > >>> > >>> > >>> Best, > >>> Bruno > >>> > >>> On 10/3/23 7:49 PM, Walker Carlson wrote: > >>>> Hey Alieh thanks for the KIP, > >>>> > >>>> Weighing in on the AsOf vs Until debate I think either is fine from a > >>>> natural language perspective. Personally AsOf makes more sense to me > >>>> where > >>>> until gives me the idea that the query is making a change. It's > >>>> totally a > >>>> connotative difference and not that important. I think as of is pretty > >>>> frequently used in point of time queries. > >>>> > >>>> Also for these methods it makes sense to drop the "get" We don't > >>>> normally use that in getters > >>>> > >>>> * The key that was specified for this query. > >>>> */ > >>>> public K getKey(); > >>>> > >>>> /** > >>>> * The starting time point of the query, if specified > >>>> */ > >>>> public Optional<Instant> getFromTimestamp(); > >>>> > >>>> /** > >>>> * The ending time point of the query, if specified > >>>> */ > >>>> public Optional<Instant> getAsOfTimestamp(); > >>>> > >>>> Other than that I didn't have too much to add. Overall I like the > >>>> direction > >>>> of the KIP and think the funcatinlyt is all there! > >>>> best, > >>>> Walker > >>>> > >>>> > >>>> > >>>> On Mon, Oct 2, 2023 at 10:46 PM Matthias J. Sax <mj...@apache.org> > >>>> wrote: > >>>> > >>>>> Thanks for the updated KIP. Overall I like it. > >>>>> > >>>>> Victoria raises a very good point, and I personally tend to prefer (I > >>>>> believe so does Victoria, but it's not totally clear from her > >>>>> email) if > >>>>> a range query would not return any tombstones, ie, only two records > in > >>>>> Victoria's example. Thus, it seems best to include a `validTo` > >>>>> ts-field > >>>>> to `VersionedRecord` -- otherwise, the retrieved result cannot be > >>>>> interpreted correctly. > >>>>> > >>>>> Not sure what others think about it. > >>>>> > >>>>> I would also be open to actually add a `includeDeletes()` (or > >>>>> `includeTombstones()`) method/flag (disabled by default) to allow > >>>>> users > >>>>> to get all tombstone: this would only be helpful if there are two > >>>>> consecutive tombstone though (if I got it right), so not sure if we > >>>>> want > >>>>> to add it or not -- it seems also possible to add it later if there > is > >>>>> user demand for it, so it might be a premature addition as this > point? > >>>>> > >>>>> > >>>>> Nit: > >>>>> > >>>>>> the public interface ValueIterator is used > >>>>> > >>>>> "is used" -> "is added" (otherwise it sounds like as if > >>>>> `ValueIterator` > >>>>> exist already) > >>>>> > >>>>> > >>>>> > >>>>> Should we also add a `.within(fromTs, toTs)` (or maybe some better > >>>>> name?) to allow specifying both bounds at once? The existing > >>>>> `RangeQuery` does the same for specifying the key-range, so might be > >>>>> good to add for time-range too? > >>>>> > >>>>> > >>>>> > >>>>> -Matthias > >>>>> > >>>>> > >>>>> On 9/6/23 5:01 AM, Bruno Cadonna wrote: > >>>>>> In my last e-mail I missed to finish a sentence. > >>>>>> > >>>>>> "I think from a KIP" > >>>>>> > >>>>>> should be > >>>>>> > >>>>>> "I think the KIP looks good!" > >>>>>> > >>>>>> > >>>>>> On 9/6/23 1:59 PM, Bruno Cadonna wrote: > >>>>>>> Hi Alieh, > >>>>>>> > >>>>>>> Thanks for the KIP! > >>>>>>> > >>>>>>> I think from a KIP > >>>>>>> > >>>>>>> 1. > >>>>>>> I propose to throw an IllegalArgumentException or an > >>>>>>> IllegalStateException for meaningless combinations. In any case, > the > >>>>>>> KIP should specify what exception is thrown. > >>>>>>> > >>>>>>> 2. > >>>>>>> Why does not specifying a range return the latest version? I would > >>>>>>> expect that it returns all versions since an empty lower or upper > >>>>>>> limit is interpreted as no limit. > >>>>>>> > >>>>>>> 3. > >>>>>>> I second Matthias comment about replacing "asOf" with "until" or > >>>>>>> "to". > >>>>>>> > >>>>>>> 4. > >>>>>>> Do we need "allVersions()"? As I said above I would return all > >>>>>>> versions if no limits are specified. I think if we get rid of > >>>>>>> allVersions() there might not be any meaningless combinations > >>>>>>> anymore. > >>>>>>> If a user applies twice the same limit like for example > >>>>>>> MultiVersionedKeyQuery.with(key).from(t1).from(t2) the last one > >>>>>>> wins. > >>>>>>> > >>>>>>> 5. > >>>>>>> Could you add some more examples with time ranges to the example > >>>>> section? > >>>>>>> > >>>>>>> 6. > >>>>>>> The KIP misses the test plan section. > >>>>>>> > >>>>>>> 7. > >>>>>>> I propose to rename the class to "MultiVersionKeyQuery" since we > are > >>>>>>> querying multiple versions of the same key. > >>>>>>> > >>>>>>> 8. > >>>>>>> Could you also add withAscendingTimestamps()? IMO it gives users > the > >>>>>>> possibility to make their code more readable instead of only > relying > >>>>>>> on the default. > >>>>>>> > >>>>>>> Best, > >>>>>>> Bruno > >>>>>>> > >>>>>>> > >>>>>>> On 8/17/23 4:13 AM, Matthias J. Sax wrote: > >>>>>>>> Thanks for splitting this part into a separate KIP! > >>>>>>>> > >>>>>>>> For `withKey()` we should be explicit that `null` is not allowed. > >>>>>>>> > >>>>>>>> (Looking into existing `KeyQuery` it seems the JavaDocs don't > cover > >>>>>>>> this either -- would you like to do a tiny cleanup PR for this, or > >>>>>>>> fix on-the-side in one of your PRs?) > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> The key query returns all the records that are valid in the time > >>>>>>>>> range starting from the timestamp {@code fromTimestamp}. > >>>>>>>> > >>>>>>>> In the JavaDocs you use the phrase `are valid` -- I think we > >>>>>>>> need to > >>>>>>>> explain what "valid" means? It might even be worth to add some > >>>>>>>> examples. It's annoying, but being precise if kinda important. > >>>>>>>> > >>>>>>>> With regard to KIP-962, should we allow `null` for time bounds ? > >>>>>>>> The > >>>>>>>> JavaDocs should also be explicit if `null` is allowed or not and > >>>>>>>> what > >>>>>>>> the semantics are if allowed. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> You are using `asOf()` however, because we are doing time-range > >>>>>>>> queries, to me using `until()` to describe the upper bound would > >>>>>>>> sound better (I am not a native speaker though, so maybe I am > off?) > >>>>>>>> > >>>>>>>> > >>>>>>>>> The key query returns all the records that have timestamp <= > >>>>>>>>> {@code > >>>>>>>>> asOfTimestamp}. > >>>>>>>> > >>>>>>>> This is only correct if not lower-bound is set, right? > >>>>>>>> > >>>>>>>> > >>>>>>>> In your reply to KIP-960 you mentioned: > >>>>>>>> > >>>>>>>>> the meaningless combinations are prevented by throwing > exceptions. > >>>>>>>> > >>>>>>>> We should add corresponding JavaDocs like: > >>>>>>>> > >>>>>>>> @throws IllegalArgumentException if {@code fromTimestamp} is > >>>>>>>> equal or > >>>>>>>> larger than {@code > >>>>>>>> untilTimestamp} > >>>>>>>> > >>>>>>>> Or something similar. > >>>>>>>> > >>>>>>>> > >>>>>>>> With regard to KIP-960: if we need to introduce a > >>>>>>>> `VersionedKeyQuery` > >>>>>>>> class for single-key-single-ts lookup, would we need to find a new > >>>>>>>> name for the query class of this KIP, given that the return type > is > >>>>>>>> different? > >>>>>>>> > >>>>>>>> > >>>>>>>> -Matthias > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On 8/16/23 10:57 AM, Alieh Saeedi wrote: > >>>>>>>>> Hi all, > >>>>>>>>> > >>>>>>>>> I splitted KIP-960 > >>>>>>>>> < > >>>>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-960%3A+Support+single-key_single-timestamp+interactive+queries+%28IQv2%29+for+versioned+state+stores > >>>>>> > >>>>>>>>> into three separate KIPs. Therefore, please continue discussions > >>>>>>>>> about single-key, multi-timestamp interactive queries here. You > >>>>>>>>> can > >>>>>>>>> see all > >>>>>>>>> the addressed reviews on the following page. Thanks in advance. > >>>>>>>>> > >>>>>>>>> KIP-968: Support single-key_multi-timestamp interactive queries > >>>>>>>>> (IQv2) for > >>>>>>>>> versioned state stores > >>>>>>>>> < > >>>>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-968%3A+Support+single-key_multi-timestamp+interactive+queries+%28IQv2%29+for+versioned+state+stores > >>>>>> > >>>>>>>>> > >>>>>>>>> I look forward to your feedback! > >>>>>>>>> > >>>>>>>>> Cheers, > >>>>>>>>> Alieh > >>>>>>>>> > >>>>> > >>>> >