danny0405 commented on PR #17610:
URL: https://github.com/apache/hudi/pull/17610#issuecomment-3995426734

   ### RocksDB as The Replica of MDT
   
   ### The update to RocksDB
   
   The rocksDB instantces are initialized and bootstrapped from scrach by 
reading the full MDT RLI index for each job restart or task failover.
   
   The incremental index upserts inferred from the data inputs are applied 
directly on these RocksDB instances, these index upserts are pass along with 
the data payloads altogether to the `IndexWrite` op for actual MDT update. The 
MDT update happens in the same lifecycle of data records write and the 
incremental upserts are a replica image of the upserts into the RocksDB.
   
   The RLI would be utilized for two cases:
   
   - serves as the source of truth of the index mapping and been utilized in 
the RocksDB bootstrap
   - cross engine compatibility
   
   The new write flow with RockDB replica:
   <img width="4704" height="1394" alt="image" 
src="https://github.com/user-attachments/assets/2e36f972-54d7-482e-8d62-045bc96b07a2";
 />
   
   ### The Clean/Eviction of Index Payloads in RocksDB
   
   For global RLI, the rocksDB instance would be closed and removed each time a 
task fails over or got a job restart.
   
   For partitioned RLI, for local RocksDB instance per `BucketAssign` task, the 
paylods under the same data partition is stored as a separate column family, 
when the data partition is based on datetime, the column family can be dropped 
very efficiently with a configurable partition lookup TTL.
   
   ### The Additional Storage Cost
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to