Paragrf commented on PR #3375:
URL: https://github.com/apache/kvrocks/pull/3375#issuecomment-4053755871

   > > @sryanyuan Would the compaction pressure triggered by a sudden burst of 
massive deletes be a potential risk point?
   > 
   > You're correct that a sudden burst of massive deletes can increase 
compaction pressure in RocksDB.
   > 
   > In our implementation, FLUSHSLOTS uses RocksDB's DeleteRange API, which is 
optimized to mark key ranges for deletion without immediately touching every 
SST file entry. This reduces write amplification compared to issuing many 
individual Delete operations.
   > 
   > However, DeleteRange still leaves tombstones and invalidated data in the 
LSM tree, which will be cleaned up during subsequent compactions. This GC phase 
will rewrite SST files to reclaim space, and can cause extra compaction load.
   > 
   > This command is intended for special operational scenarios — for example, 
when a cluster is not serving live traffic and an operator needs to bulk-clear 
data for specific slots. In such cases, temporary compaction pressure is 
acceptable, and we avoid running it during normal high-load periods.
   > 
   > To mitigate impact, we recommend:
   > 
   > * Running FLUSHSLOTS during off-peak hours.
   > * Limiting the number of slots cleared in a single operation (e.g., only a 
small fraction of the 16,384 slots at a time), to reduce the volume of deleted 
data and spread GC load over time.
   > * Adjusting compaction options (e.g., background threads, delete range 
thresholds) if necessary.
   > * Optionally forcing manual compaction on affected ranges after deletion.
   
   Exactly. If DeleteRange removes half of the data at once, it's highly likely 
to cross RocksDB's internal monitoring thresholds, triggering an involuntary 
background compaction immediately. This could result in a massive, 
unpredictable impact on the cluster's performance.
   
   So I'm wondering if we should avoid calling DeleteRange immediately after 
slot migration and topology updates. Instead, would it be less impactful on the 
cluster to simply decommission and destroy the entire instance once all its 
data has been migrated out? This might be a cleaner way to offload the 
compaction overhead.
   
   Since horizontal scaling is the goal, it would be much smoother to split the 
source node's data and migrate each half to two separate target nodes. Once the 
migration is finished and the original node is left with zero data, we can 
simply decommission and destroy the instance. This avoids the DeleteRange 
overhead entirely.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to