Re: [DISCUSS] KIP-835: Monitor KRaft Controller Quorum Health

José Armando García Sancio Wed, 11 May 2022 17:41:33 -0700

Thanks for all of the feedback. Some comments below:

Luke wrote:
> 1. Jason has asked but you didn't answer: What is the default value for `
> metadata.monitor.write.interval.ms`?

Thanks for asking again. Looks like I missed this in my previous
reply. In the implementation I am currently working on, I have it at 1
second. Colin is suggesting 500ms. I should point out that the fetch
timeout default is currently 2 seconds. I'll go with Colin's
suggestion and make it 500ms.

> 2. The `noopRecord` API key is `TBD`. Why can't we put the "currently used
> API Key nums + 1" into it? Any concern?

Yes. It will be the highest currently used API key for metadata
records plus one. I didn't specify it yet since the code hasn't been
committed. I usually update the KIP once the code is committed to
trunk.

> 3. typo: name=MetadataWriteOffse[t]s

Fixed.

> 4. I don't understand the difference between MetadataLastCommittedOffset
> and MetadataWriteOffsets metrics. I think the 2 values will always be
> dentical, unless the controller metadata write failed, is that correct?

The active controller tracks two offsets, the writeOffset and the
lastCommittedOffset. The writeOffset is the largest offset that was
sent to the KRaft layer to get committed. The lastCommittedOffset is
the largest offset that the controller knows got committed. The
controller increases the lastCommittedOffset as the KRaft layer
replicates records, commits records and finally the controller handles
the committed record. Colin suggested calling this the
last-applied-offset-*`. I agree with this rename as it more accurately
represents the state being reported.

I updated the description of those metrics. Please take a look.

David wrote:
> 1. Based on the config name "metadata.monitor.write.interval.ms" I'm
> guessing the intention is to have a regularly scheduled write. If the
> quorum is busy with lots of the writes, we wouldn't need this NoopRecord
> ight? Maybe we can instead write a NoopRecord only after some amount of
> idle time.

Correct. I suspect that the first implementation will always write
this record in the interval configured. Future implementation can
optimize this and only write the NoopRecord if the active controller
didn't already write a record in this time interval.

> 2. A small naming suggestion, what about "NoOpRecord"?

Sounds good to me. I changed the record to NoOpRecord.

> 4. We should consider omitting these records from the log dump tool, or at
> least adding an option to skip over them.

Thanks for pointing this out. DumpLogSegments uses MetadataRecordSerde
so it should just work as long as it is the same software version as
the Kafka node. The issue is if the user uses an old DumpLogSegments
on a metadata log that supports this feature.

I updated the "Compatibility, Deprecation, and Migration Plan" section
to document this

> 5. If (somehow) one of these NoopRecord appeared in the snapshot, it sounds
> like the broker/controller would skip over them. We should specify this
> behavior in the KIP

Excluding bugs they should not be part of the metadata snapshot
because the controllers will not store this information in the
"timeline" data types and the brokers will not include this
information in the "metadata image". In other words both the
controller and brokers will skip this record when replaying the log
and snapshot. Both `handleCommit` and `handleSnapshot` eventually
execute the same "replay" code.

Colin wrote:
> I had the same question as Luke and Jason: what's the default here for the 
> NoOpRecord time? :) We should add a value here even if we think we'll adjust 
> it later, just to give a feeling for how much traffic this would create. 
> Perhaps 500 ms?

I agree. I updated the KIP to document this as 500ms.

> Also how about naming the time configuration something like 
> "metadata.max.idle.interval.ms", to make it clear that this is the maximum 
> time we can go between writes to the metadata log. I don't get that meaning 
> out of "monitor interval"...

I like this suggestion. I think this name makes it less implementation
specific and allows us to not write NoOpRecords if the active
controller already wrote a record in the configured time interval.

> I'd suggest renaming this as "last-applied-offset-*" to make it clear that 
> the offset we're measuring is the last one that the broker applied. The last 
> committed offset is something else, a more log-layer-centric concept.

I agree. I change the KIP to prefix these metrics with "last applied
record ...".

> (It would also be good to have a separate metric which purely reflected the 
> last metadata offset we've fetched and not applied, so we can see if the 
> delta between that and last-applied-offset increases...)

In KIP-595 we have this metric which reports this information:
1. log-end-offset, type=raft-manager, dynamic gauge

Here are the changes to the KIP:
https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=211883219&selectedPageVersions=6&selectedPageVersions=5

Thanks for the feedback everyone. Hopefully this time I didn't miss
any question or suggestion,
-José

Re: [DISCUSS] KIP-835: Monitor KRaft Controller Quorum Health

Reply via email to