spena opened a new pull request #10331: URL: https://github.com/apache/kafka/pull/10331
The new `RocksDBTimeOrderedWindowStore` stores key/value records with a combined key that starts with the record timestamp, followed by a sequential number and the key in bytes (`<time,seq,key>`). This key combination will keep all records ordered by timestamp in the RocksDB store. The advantage of keeping records this way is to make range queries functions more efficient, like `fetchAll(from, to)`. This range method returns all keys in the specified time range using a more efficient key schema to iterate on RocksDB. `RocksDBTimeOrderedWindowStore` with key query ranges, like `fetch(key, from, to)` or `fetch(keyFrom, keyTo, from, to)`, do not perform very well. For cases with key query ranges, it is better to use the current `RocksDBWindowStore` which stores records using the record key as the RocksDB key prefix. `RocksDBTimeOrderedWindowStore` is meant to fix the issue with https://issues.apache.org/jira/browse/KAFKA-10847 which requires a temporary store where to hold non-joined records, and later do a query range of all keys in a specified time range. The PR also adds a new bytes lexico comparator using the key prefixes in the `RocksDBRangeIterator`. ``` comparator.compare(0001, 0001000F); // smallest key prefix is 4 bytes, so 0001 == 0001 comparator.compare(0001000F, 0001); // smallest key prefix is 4 bytes, so 0001 == 0001 comparator.compare(0002000F, 0001); // smallest key prefix is 4 bytes, so 0002 > 0001 comparator.compare(0001000F, 0002); // smallest key prefix is 4 bytes, so 0001 < 0002 ``` This new prefix bytes lexico comparator is used when `range` and `reverseRange` iterators are called with prefixScan=True. Otherwise, the current bytes lexico comparator that compares the full key is used. ``` KeyValueIterator<K, V> range(K from, K to, boolean prefixScan) KeyValueIterator<K, V> reverseRange(K from, K to, boolean prefixScan) ``` *Summary of testing strategy (including rationale) for the feature or bug fix. Unit and/or integration tests are expected for any behaviour change and system tests should be considered for larger changes.* - Added unit tests ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
