Paragrf commented on PR #3375: URL: https://github.com/apache/kvrocks/pull/3375#issuecomment-4053755871
> > @sryanyuan Would the compaction pressure triggered by a sudden burst of massive deletes be a potential risk point? > > You're correct that a sudden burst of massive deletes can increase compaction pressure in RocksDB. > > In our implementation, FLUSHSLOTS uses RocksDB's DeleteRange API, which is optimized to mark key ranges for deletion without immediately touching every SST file entry. This reduces write amplification compared to issuing many individual Delete operations. > > However, DeleteRange still leaves tombstones and invalidated data in the LSM tree, which will be cleaned up during subsequent compactions. This GC phase will rewrite SST files to reclaim space, and can cause extra compaction load. > > This command is intended for special operational scenarios — for example, when a cluster is not serving live traffic and an operator needs to bulk-clear data for specific slots. In such cases, temporary compaction pressure is acceptable, and we avoid running it during normal high-load periods. > > To mitigate impact, we recommend: > > * Running FLUSHSLOTS during off-peak hours. > * Limiting the number of slots cleared in a single operation (e.g., only a small fraction of the 16,384 slots at a time), to reduce the volume of deleted data and spread GC load over time. > * Adjusting compaction options (e.g., background threads, delete range thresholds) if necessary. > * Optionally forcing manual compaction on affected ranges after deletion. Exactly. If DeleteRange removes half of the data at once, it's highly likely to cross RocksDB's internal monitoring thresholds, triggering an involuntary background compaction immediately. This could result in a massive, unpredictable impact on the cluster's performance. So I'm wondering if we should avoid calling DeleteRange immediately after slot migration and topology updates. Instead, would it be less impactful on the cluster to simply decommission and destroy the entire instance once all its data has been migrated out? This might be a cleaner way to offload the compaction overhead. Since horizontal scaling is the goal, it would be much smoother to split the source node's data and migrate each half to two separate target nodes. Once the migration is finished and the original node is left with zero data, we can simply decommission and destroy the instance. This avoids the DeleteRange overhead entirely. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
