Thanks for the updates, Alieh!
The example in the KIP uses the allVersions() method which we agreed to
remove.
Regarding your questions:
1. asOf vs. until: I am fine with both but slightly prefer until.
2. What about KeyMultiVersionsQuery, KeyVersionsQuery (KIP-960 would
then be KeyVersionQuery). However, I am also fine with
MultiVersionedKeyQuery since none of the names sounds better or worse to
me.
3. I agree with you not to introduce the method with the two bounds to
keep things simple.
4. Forget about fromTime() an asOfTime(), from() and asOf() is fine.
5. The main purpose is to show how to use the API. Maybe make an example
with just the key to distinguish this query from the single value query
of KIP-960 and then one with a key and a time range. When you iterate
over the results you could also call validTo(). Maybe add some actual
records in the comments to show what the result might look like.
Regarding the test plan, I hope you also plan to add unit tests in all
of your KIPs. Maybe you could also explain why system tests are not
needed here.
Best,
Bruno
On 10/10/23 5:36 PM, Alieh Saeedi wrote:
Thank you all for the very exact and constructive comments. I really
enjoyed reading your ideas and all the important points you made me aware
of. I updated KIP-968 as follows:
1. If the key or time bounds are null, the method returns NPE.
2. The "valid" word: I removed the sentence "all the records that are
valid..." and replaced it with an exact explanation. More over, I explained
it with an example in the KIP but not in the javadocs. Do I need to add the
example to the javadocs as well?
3. Since I followed Bruno's suggestion and removed the allVersions()
method, the problem of meaningless combinations is solved, and I do not
need any IllegalArgumentException or something like that. Therefore, the
change is that if no time bound is specified, the query returns the records
with the specified key for all timestamps (all versions).
4. As Victoria suggested, adding a method to the *VersionedKeyValueStore
*interface is essential. So I did that. I had this method only in the
RocksDBVersionedStore class, which was not enough.
5. I added the *validTo* field to the VersionedRecord class to be able
to represent the tombstones. As you suggested, we postpone solving the
problem of retrieving consecutive tombstones for later.
6. I added the "Test Plan" section to all KIPs. I hope what I wrote is
convincing.
7. I added the *withAscendingTimestamp()* method to provide more
code readability
for the user.
8. I removed the evil word "get" from all getter methods.
There have also been some more suggestions which I am still not convinced
or clear about them:
1. Regarding asOf vs until: reading all comments, my conclusion was that
I keep it as "asOf" (following Walker's idea as the native speaker as well
as Bruno's suggestion to be consistent with single-key_single_ts queries).
But I do not have a personal preference. If you insist on "until", I change
it.
2. Bruno suggested renaming the class "MultiVersionedKeyQuery" to sth
else. We already had a long discussion about the name with Matthias. I am
open to renaming it to something else, but do you have any ideas?
3. Matthias suggested having a method with two input parameters that
enables the user to specify both time bounds in the same method. Isn't it
introducing redundancy? It is somehow disrespectful to the idea of having
composable methods.
4. Bruno suggested renaming the methods "asOf" and "from" to "asOfTime"
and "fromTime". If I do that, then it is not consistent with KIP-960.
Moreover, the input parameter is clearly a timestamp, which explains
enough. What do you think about that?
5. I was asked to add more examples to the example section. My question
is, what is the main purpose of that? If I know it clearly, then I can add
what you mean.
Cheers,
Alieh
On Tue, Oct 10, 2023 at 1:13 AM Matthias J. Sax <mj...@apache.org> wrote:
Bruno and I had some background conversation about the `get` prefix
question including a few other committers.
The official policy was never changed, and we should not add the
`get`-prefix. It's a slip on our side in previous KIPs to add the
`get`-prefix and we should actually clean it up doing a follow up KIP.
-Matthias
On 10/5/23 5:26 AM, Bruno Cadonna wrote:
Hi Matthias,
Given all the IQv2 KIPs that use getX and given recent PRs (internal
interfaces mainly) that got merged, I was under the impression that we
moved away from the strict no-getX policy.
I do not think it was an accident using getX in the IQv2 KIPs since
somebody would have brought it up, otherwise.
I am fine with both types of getters.
If we think, we need to discuss this in a broader context, let's start a
separate thread.
Best,
Bruno
On 10/5/23 7:44 AM, Matthias J. Sax wrote:
I agree to (almost) everything what Bruno said.
In general, we tend to move away from using getters without "get",
recently. So I would keep the "get".
This is new to me? Can you elaborate on this point? Why do you think
that's the case?
I actually did realize (after Walker mentioned it) that existing query
types use `get` prefix, but to me it seems that it was by accident and
we should consider correcting it? Thus, I would actually prefer to not
add the `get` prefix for new methods query types.
IMHO, we should do a follow up KIP to deprecate all methods with `get`
prefix and replace them with new ones without `get` -- it's of course
always kinda "unnecessary" noise, but if we don't do it, we might get
into more and more inconsistent naming what would result in a "bad" API.
If we indeed want to change the convention and use the `get` prefix, I
would strongly advocate to bit the bullet and do KIP to pro-actively
add the `get` "everywhere" it's missing... But overall, it seems to be
a much broader decision, and we should get buy in from many committers
about it -- as long as there is no broad consensus to add `get`
everywhere, I would strongly prefer not to diverge from the current
agreement to omit `get`.
-Matthias
On 10/4/23 2:36 AM, Bruno Cadonna wrote:
Hi,
Regarding tombstones:
As far as I understand, we need to add either a validTo field to
VersionedRecord or we need to return tombstones, otherwise the result
is not complete, because users could never know a record was deleted
at some point before the second non-null value was put.
I like more adding the validTo field since it makes the result more
concise and easier interpretable.
Extending on Victoria's example, with the following puts
put(k, v1, time=0)
put(k, null, time=5)
put(k, null, time=10)
put(k, null, time=15)
put(k, v2, time=20)
the result with tombstones would be
value, timestamp
(v1, 0)
(null, 5)
(null, 10)
(null, 15)
(v2, 20)
instead of
value, timestamp, validTo
(v1, 0, 5)
(v2, 20, null)
The benefit of conciseness would already apply to one single tombstone.
On the other hand, why would somebody write consecutive tombstones
into a versioned state store? I guess if somebody does that on
purpose, then there should be a way to retrieve each of those
tombstones, right?
So maybe we need both -- validTo field and the option to return
tombstones. The latter might be moved to a future KIP in case we see
the need.
Regarding .within(fromTs, toTs):
I would keep it simple with .from() and .asOfTimestamp() (or
.until()). If we go with .within(), I would opt for
.withinTimeRange(fromTs, toTs), because the query becomes more
readable:
MultiVersionedKeyQuery
.withKey(1)
.withinTimeRange(Instant.parse(2023-08-03T10:37:30.00Z),
Instant.parse(2023-08-04T10:37:30.00Z))
If we stay with .from() and .until(), we should consider .fromTime()
and .untilTime() (or .toTime()):
MultiVersionedKeyQuery
.withKey(1)
.fromTime(Instant.parse(2023-08-03T10:37:30.00Z))
.untilTime(Instant.parse(2023-08-04T10:37:30.00Z))
Regarding asOf vs. until:
I think asOf() is more used in point in time queries as Walker
mentioned where this KIP specifies a time range. IMO asOf() fits very
well with KIP-960 where one version is queried, but here I think
.until() fits better. That might just be a matter of taste and in the
end I am fine with both as long as it is well documented.
Regarding getters without "get":
In the other IQv2 classes we used getters with "get". In general, we
tend to move away from using getters without "get", recently. So I
would keep the "get".
Best,
Bruno
On 10/3/23 7:49 PM, Walker Carlson wrote:
Hey Alieh thanks for the KIP,
Weighing in on the AsOf vs Until debate I think either is fine from a
natural language perspective. Personally AsOf makes more sense to me
where
until gives me the idea that the query is making a change. It's
totally a
connotative difference and not that important. I think as of is pretty
frequently used in point of time queries.
Also for these methods it makes sense to drop the "get" We don't
normally use that in getters
* The key that was specified for this query.
*/
public K getKey();
/**
* The starting time point of the query, if specified
*/
public Optional<Instant> getFromTimestamp();
/**
* The ending time point of the query, if specified
*/
public Optional<Instant> getAsOfTimestamp();
Other than that I didn't have too much to add. Overall I like the
direction
of the KIP and think the funcatinlyt is all there!
best,
Walker
On Mon, Oct 2, 2023 at 10:46 PM Matthias J. Sax <mj...@apache.org>
wrote:
Thanks for the updated KIP. Overall I like it.
Victoria raises a very good point, and I personally tend to prefer (I
believe so does Victoria, but it's not totally clear from her
email) if
a range query would not return any tombstones, ie, only two records
in
Victoria's example. Thus, it seems best to include a `validTo`
ts-field
to `VersionedRecord` -- otherwise, the retrieved result cannot be
interpreted correctly.
Not sure what others think about it.
I would also be open to actually add a `includeDeletes()` (or
`includeTombstones()`) method/flag (disabled by default) to allow
users
to get all tombstone: this would only be helpful if there are two
consecutive tombstone though (if I got it right), so not sure if we
want
to add it or not -- it seems also possible to add it later if there
is
user demand for it, so it might be a premature addition as this
point?
Nit:
the public interface ValueIterator is used
"is used" -> "is added" (otherwise it sounds like as if
`ValueIterator`
exist already)
Should we also add a `.within(fromTs, toTs)` (or maybe some better
name?) to allow specifying both bounds at once? The existing
`RangeQuery` does the same for specifying the key-range, so might be
good to add for time-range too?
-Matthias
On 9/6/23 5:01 AM, Bruno Cadonna wrote:
In my last e-mail I missed to finish a sentence.
"I think from a KIP"
should be
"I think the KIP looks good!"
On 9/6/23 1:59 PM, Bruno Cadonna wrote:
Hi Alieh,
Thanks for the KIP!
I think from a KIP
1.
I propose to throw an IllegalArgumentException or an
IllegalStateException for meaningless combinations. In any case,
the
KIP should specify what exception is thrown.
2.
Why does not specifying a range return the latest version? I would
expect that it returns all versions since an empty lower or upper
limit is interpreted as no limit.
3.
I second Matthias comment about replacing "asOf" with "until" or
"to".
4.
Do we need "allVersions()"? As I said above I would return all
versions if no limits are specified. I think if we get rid of
allVersions() there might not be any meaningless combinations
anymore.
If a user applies twice the same limit like for example
MultiVersionedKeyQuery.with(key).from(t1).from(t2) the last one
wins.
5.
Could you add some more examples with time ranges to the example
section?
6.
The KIP misses the test plan section.
7.
I propose to rename the class to "MultiVersionKeyQuery" since we
are
querying multiple versions of the same key.
8.
Could you also add withAscendingTimestamps()? IMO it gives users
the
possibility to make their code more readable instead of only
relying
on the default.
Best,
Bruno
On 8/17/23 4:13 AM, Matthias J. Sax wrote:
Thanks for splitting this part into a separate KIP!
For `withKey()` we should be explicit that `null` is not allowed.
(Looking into existing `KeyQuery` it seems the JavaDocs don't
cover
this either -- would you like to do a tiny cleanup PR for this, or
fix on-the-side in one of your PRs?)
The key query returns all the records that are valid in the time
range starting from the timestamp {@code fromTimestamp}.
In the JavaDocs you use the phrase `are valid` -- I think we
need to
explain what "valid" means? It might even be worth to add some
examples. It's annoying, but being precise if kinda important.
With regard to KIP-962, should we allow `null` for time bounds ?
The
JavaDocs should also be explicit if `null` is allowed or not and
what
the semantics are if allowed.
You are using `asOf()` however, because we are doing time-range
queries, to me using `until()` to describe the upper bound would
sound better (I am not a native speaker though, so maybe I am
off?)
The key query returns all the records that have timestamp <=
{@code
asOfTimestamp}.
This is only correct if not lower-bound is set, right?
In your reply to KIP-960 you mentioned:
the meaningless combinations are prevented by throwing
exceptions.
We should add corresponding JavaDocs like:
@throws IllegalArgumentException if {@code fromTimestamp} is
equal or
larger than {@code
untilTimestamp}
Or something similar.
With regard to KIP-960: if we need to introduce a
`VersionedKeyQuery`
class for single-key-single-ts lookup, would we need to find a new
name for the query class of this KIP, given that the return type
is
different?
-Matthias
On 8/16/23 10:57 AM, Alieh Saeedi wrote:
Hi all,
I splitted KIP-960
<
https://cwiki.apache.org/confluence/display/KAFKA/KIP-960%3A+Support+single-key_single-timestamp+interactive+queries+%28IQv2%29+for+versioned+state+stores
into three separate KIPs. Therefore, please continue discussions
about single-key, multi-timestamp interactive queries here. You
can
see all
the addressed reviews on the following page. Thanks in advance.
KIP-968: Support single-key_multi-timestamp interactive queries
(IQv2) for
versioned state stores
<
https://cwiki.apache.org/confluence/display/KAFKA/KIP-968%3A+Support+single-key_multi-timestamp+interactive+queries+%28IQv2%29+for+versioned+state+stores
I look forward to your feedback!
Cheers,
Alieh