alanlau28 opened a new pull request, #22440: URL: https://github.com/apache/kafka/pull/22440
Jira: https://issues.apache.org/jira/browse/KAFKA-14276 ### Summary Clarifies in Javadoc that `KeyValueStore.approximateNumEntries()` may return a count substantially larger than the number of live keys iteration would yield, particularly when called from `Processor.init()` immediately after restoration. ### Scenario For RocksDB-backed stores (the default), `approximateNumEntries()` returns the `rocksdb.estimate-num-keys` property, which sums per-SST and memtable entry counts without doing a merged read. Restoring a store from its changelog replays every record in the topic — including the un-compacted tail (the active segment is never broker-compacted) — so duplicate updates for the same key and tombstones all land as separate entries across the memtable and multiple SST files. The estimate counts each of those entries; iteration uses RocksDB's merged read view and returns the de-duplicated, tombstone-skipping live count. The two converge as background RocksDB compaction proceeds, but at the moment `Processor.init()` runs they can differ by a large factor. `InMemoryKeyValueStore` is not affected (it's backed by a `Map`, so duplicates overwrite and tombstones delete). Concretely, with a workload of 1000 unique keys, ~5× overwrites for a third of them, and 200 tombstones (→ 800 live keys): | Store / state | `approximateNumEntries()` | iteration | |----------------------------------------------|---------------------------|-----------| | RocksDB, just after restore (memtable + SST) | 2465 | 800 | | RocksDB, after flush (no compaction) | 1133 | 800 | | RocksDB, after full `compactRange()` | 800 | 800 | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
