Jungtaek Lim created SPARK-55997:
------------------------------------
Summary: Set exclusive upper bound of prefix scan for RocksDB
Key: SPARK-55997
URL: https://issues.apache.org/jira/browse/SPARK-55997
Project: Spark
Issue Type: Improvement
Components: Structured Streaming
Affects Versions: 4.2.0
Reporter: Jungtaek Lim
For prefix scan (and iterator with column family) in RocksDB state store
provider, we create an iterator and seek to the first valid key for the prefix
(or vcf), and trigger next till there is no key or the given key is out of
bound.
When triggering next, RocksDB has to figure out the next valid key from current
position. The issue is "valid" key - let's say there is column family vcf1
which is set to perform prefix scan, and the prefix of the keys are 'a', 'b',
'c' (for simplicity). After we remove all keys for the prefix 'b', prefix scan
of the prefix 'a' has to go through all tombstones for 'b' to finally find the
valid key from the prefix 'c', which can take a lot of time if the number of
keys for prefix 'b' was outstanding.
Since we use virtual column family (vcf) and vcf is identified by prefix, we
have the similar problem "across" vcfs. Suppose the case where there are two
virtual column families vcf1 and vcf2, where vcf1 is set to perform prefix scan
while vcf2 is to perform range scan (and sequentially removed based on
watermark/timestamp advancement). If there is prefix scan for vcf1 which is the
last prefix of vcf1, it has to go through tombstones for vcf2 to finally find
the valid key for vcf2.
RocksDB offers the functionality to provide the upper bound (exclusive) of
iterator - when the iterator knows it's going to be out of bound, it stops
finding the next key and immediately returns. This functionality should fix the
above issues.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]