Thanks for all of the feedback. Some comments below: Luke wrote: > 1. Jason has asked but you didn't answer: What is the default value for ` > metadata.monitor.write.interval.ms`?
Thanks for asking again. Looks like I missed this in my previous reply. In the implementation I am currently working on, I have it at 1 second. Colin is suggesting 500ms. I should point out that the fetch timeout default is currently 2 seconds. I'll go with Colin's suggestion and make it 500ms. > 2. The `noopRecord` API key is `TBD`. Why can't we put the "currently used > API Key nums + 1" into it? Any concern? Yes. It will be the highest currently used API key for metadata records plus one. I didn't specify it yet since the code hasn't been committed. I usually update the KIP once the code is committed to trunk. > 3. typo: name=MetadataWriteOffse[t]s Fixed. > 4. I don't understand the difference between MetadataLastCommittedOffset > and MetadataWriteOffsets metrics. I think the 2 values will always be > dentical, unless the controller metadata write failed, is that correct? The active controller tracks two offsets, the writeOffset and the lastCommittedOffset. The writeOffset is the largest offset that was sent to the KRaft layer to get committed. The lastCommittedOffset is the largest offset that the controller knows got committed. The controller increases the lastCommittedOffset as the KRaft layer replicates records, commits records and finally the controller handles the committed record. Colin suggested calling this the last-applied-offset-*`. I agree with this rename as it more accurately represents the state being reported. I updated the description of those metrics. Please take a look. David wrote: > 1. Based on the config name "metadata.monitor.write.interval.ms" I'm > guessing the intention is to have a regularly scheduled write. If the > quorum is busy with lots of the writes, we wouldn't need this NoopRecord > ight? Maybe we can instead write a NoopRecord only after some amount of > idle time. Correct. I suspect that the first implementation will always write this record in the interval configured. Future implementation can optimize this and only write the NoopRecord if the active controller didn't already write a record in this time interval. > 2. A small naming suggestion, what about "NoOpRecord"? Sounds good to me. I changed the record to NoOpRecord. > 4. We should consider omitting these records from the log dump tool, or at > least adding an option to skip over them. Thanks for pointing this out. DumpLogSegments uses MetadataRecordSerde so it should just work as long as it is the same software version as the Kafka node. The issue is if the user uses an old DumpLogSegments on a metadata log that supports this feature. I updated the "Compatibility, Deprecation, and Migration Plan" section to document this > 5. If (somehow) one of these NoopRecord appeared in the snapshot, it sounds > like the broker/controller would skip over them. We should specify this > behavior in the KIP Excluding bugs they should not be part of the metadata snapshot because the controllers will not store this information in the "timeline" data types and the brokers will not include this information in the "metadata image". In other words both the controller and brokers will skip this record when replaying the log and snapshot. Both `handleCommit` and `handleSnapshot` eventually execute the same "replay" code. Colin wrote: > I had the same question as Luke and Jason: what's the default here for the > NoOpRecord time? :) We should add a value here even if we think we'll adjust > it later, just to give a feeling for how much traffic this would create. > Perhaps 500 ms? I agree. I updated the KIP to document this as 500ms. > Also how about naming the time configuration something like > "metadata.max.idle.interval.ms", to make it clear that this is the maximum > time we can go between writes to the metadata log. I don't get that meaning > out of "monitor interval"... I like this suggestion. I think this name makes it less implementation specific and allows us to not write NoOpRecords if the active controller already wrote a record in the configured time interval. > I'd suggest renaming this as "last-applied-offset-*" to make it clear that > the offset we're measuring is the last one that the broker applied. The last > committed offset is something else, a more log-layer-centric concept. I agree. I change the KIP to prefix these metrics with "last applied record ...". > (It would also be good to have a separate metric which purely reflected the > last metadata offset we've fetched and not applied, so we can see if the > delta between that and last-applied-offset increases...) In KIP-595 we have this metric which reports this information: 1. log-end-offset, type=raft-manager, dynamic gauge Here are the changes to the KIP: https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=211883219&selectedPageVersions=6&selectedPageVersions=5 Thanks for the feedback everyone. Hopefully this time I didn't miss any question or suggestion, -José