spena opened a new pull request #10331:
URL: https://github.com/apache/kafka/pull/10331


   The new `RocksDBTimeOrderedWindowStore` stores key/value records with a 
combined key that starts with the record timestamp, followed by a sequential 
number and the key in bytes (`<time,seq,key>`). This key combination will keep 
all records ordered by timestamp in the RocksDB store. The advantage of keeping 
records this way is to make range queries functions more efficient, like 
`fetchAll(from, to)`. This range method returns all keys in the specified time 
range using a more efficient key schema to iterate on RocksDB.
   
   `RocksDBTimeOrderedWindowStore` with key query ranges, like `fetch(key, 
from, to)` or `fetch(keyFrom, keyTo, from, to)`, do not perform very well. For 
cases with key query ranges, it is better to use the current 
`RocksDBWindowStore` which stores records using the record key as the RocksDB 
key prefix. `RocksDBTimeOrderedWindowStore` is meant to fix the issue with 
https://issues.apache.org/jira/browse/KAFKA-10847 which requires a temporary 
store where to hold non-joined records, and later do a query range of all keys 
in a specified time range.
   
   The PR also adds a new bytes lexico comparator using the key prefixes in the 
`RocksDBRangeIterator`. 
   ```
   comparator.compare(0001, 0001000F); // smallest key prefix is 4 bytes, so 
0001 == 0001
   comparator.compare(0001000F, 0001); // smallest key prefix is 4 bytes, so 
0001 == 0001
   comparator.compare(0002000F, 0001); // smallest key prefix is 4 bytes, so 
0002 > 0001
   comparator.compare(0001000F, 0002); // smallest key prefix is 4 bytes, so 
0001 < 0002
   ```
   This new prefix bytes lexico comparator is used when `range` and 
`reverseRange` iterators are called with prefixScan=True. Otherwise, the 
current bytes lexico comparator that compares the full key is used.
   ```
   KeyValueIterator<K, V> range(K from, K to, boolean prefixScan)
   KeyValueIterator<K, V> reverseRange(K from, K to, boolean prefixScan)
   ```
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   - Added unit tests
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to