Jungtaek Lim created SPARK-55997:
------------------------------------

             Summary: Set exclusive upper bound of prefix scan for RocksDB
                 Key: SPARK-55997
                 URL: https://issues.apache.org/jira/browse/SPARK-55997
             Project: Spark
          Issue Type: Improvement
          Components: Structured Streaming
    Affects Versions: 4.2.0
            Reporter: Jungtaek Lim


For prefix scan (and iterator with column family) in RocksDB state store 
provider, we create an iterator and seek to the first valid key for the prefix 
(or vcf), and trigger next till there is no key or the given key is out of 
bound.

When triggering next, RocksDB has to figure out the next valid key from current 
position. The issue is "valid" key - let's say there is column family vcf1 
which is set to perform prefix scan, and the prefix of the keys are 'a', 'b', 
'c' (for simplicity). After we remove all keys for the prefix 'b', prefix scan 
of the prefix 'a' has to go through all tombstones for 'b' to finally find the 
valid key from the prefix 'c', which can take a lot of time if the number of 
keys for prefix 'b' was outstanding.

Since we use virtual column family (vcf) and vcf is identified by prefix, we 
have the similar problem "across" vcfs. Suppose the case where there are two 
virtual column families vcf1 and vcf2, where vcf1 is set to perform prefix scan 
while vcf2 is to perform range scan (and sequentially removed based on 
watermark/timestamp advancement). If there is prefix scan for vcf1 which is the 
last prefix of vcf1, it has to go through tombstones for vcf2 to finally find 
the valid key for vcf2.

RocksDB offers the functionality to provide the upper bound (exclusive) of 
iterator - when the iterator knows it's going to be out of bound, it stops 
finding the next key and immediately returns. This functionality should fix the 
above issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to