Thanks for the reply Jun. Some comments below. Here are the changes: https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=158864763&selectedPageVersions=27&selectedPageVersions=26
> 20. Good point on metadata cache. I think we need to make a decision > consistently. For example, if we decide that dedicated voter nodes don't > serve metadata requests, then we don't need to expose the voters host/port > to the client. Which KIP should make this decision? Makes sense. My opinion is that this should be addressed in KIP-631 since I think exposing this information is independent of snapshotting. Note that I think there is a long term goal to make the __cluster_metadata topic partition readable by a Kafka Consumer but we can address that in a future KIP. > 31. controller.snapshot.minimum.records: For a compacted topic, we use a > ratio instead of the number of records to determine when to compact. This > has some advantages. For example, if we use > controller.snapshot.minimum.records and set it to 1000, then it will > trigger the generation of a new snapshot when the existing snapshot is > either 10MB or 1GB. Intuitively, the larger the snapshot, the more > expensive it is to write to disk. So, we want to wait for more data to be > accumulated before generating the next snapshot. The ratio based setting > achieves this. For instance, a 50% ratio requires 10MB/1GB more data to be > accumulated to regenerate a 10MB/1GB snapshot respectively. I agree. I proposed using a simple algorithm like "controller.snapshot.minimum.records" since calculating a dirty ratio may not be straightforward when replicated log records don't map 1:1 to snapshot records. But I think we can implement a heuristic for this. There is a small complication when generating the first snapshot but it should be implementable. Here is the latest wording of the "When to Snapshot" section: If the Kafka Controller generates a snapshot too frequently then it can negatively affect the performance of the disk. If it doesn't generate a snapshot often enough then it can increase the amount of time it takes to load its state into memory and it can increase the amount space taken up by the replicated log. The Kafka Controller will have a new configuration option controller.snapshot.min.cleanable.ratio. If the number of snapshot records that have changed (deleted or modified) between the latest snapshot and the current in-memory state divided by the total number of snapshot records is greater than controller.snapshot.min.cleanable.ratio, then the Kafka Controller will perform a new snapshot. Note that new snapshot records don't count against this ratio. If a new snapshot record was added since that last snapshot then it doesn't affect the dirty ratio. If a snapshot record was added and then modified or deleted then it counts against the dirty ratio. > 32. max.replication.lag.ms: It seems this is specific to the metadata > topic. Could we make that clear in the name? Good catch. Done.