[jira] [Updated] (HDDS-13003) Snapshot Defragmentation to reduce storage footprint

Siyao Meng (Jira) Thu, 24 Jul 2025 08:44:04 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-13003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Siyao Meng updated HDDS-13003:
------------------------------
    Description: 
In Apache Ozone, snapshots currently take a checkpoint of the Active Object 
Store (AOS) RocksDB each time a snapshot is created and track the compaction of 
SST files over time. This model works efficiently when snapshots are 
short-lived, as they merely serve as hard links to the AOS RocksDB. However, 
over time, if an older snapshot persists while significant churn occurs in the 
AOS RocksDB (due to compactions and writes), the snapshot RocksDB may diverge 
significantly from both the AOS RocksDB and other snapshot RocksDB instances. 
This divergence increases storage requirements linearly with the number of 
snapshots.


The primary inefficiency in the current snapshotting mechanism stems from 
constant RocksDB compactions in AOS, which can cause a key, file, or directory 
entry to appear in multiple SST files. Ideally, each unique key, file, or 
directory entry should reside in only one SST file, eliminating redundant 
storage and mitigating the multiplier effect caused by snapshots. If 
implemented correctly, the total RocksDB size would be proportional to the 
total number of unique keys in the system rather than the number of snapshots.

----

Note: *Snapshot Defragmentation* was previously called *Snapshot Compaction* 
during development and in the design doc. It is renamed because the *Snapshot 
Compaction* name can be easily confused with *[RocksDB 
Compaction|https://github.com/facebook/rocksdb/wiki/Compaction]*, which is a 
different concept.

  was:
In Apache Ozone, snapshots currently take a checkpoint of the Active Object 
Store (AOS) RocksDB each time a snapshot is created and track the compaction of 
SST files over time. This model works efficiently when snapshots are 
short-lived, as they merely serve as hard links to the AOS RocksDB. However, 
over time, if an older snapshot persists while significant churn occurs in the 
AOS RocksDB (due to compactions and writes), the snapshot RocksDB may diverge 
significantly from both the AOS RocksDB and other snapshot RocksDB instances. 
This divergence increases storage requirements linearly with the number of 
snapshots.


The primary inefficiency in the current snapshotting mechanism stems from 
constant RocksDB compactions in AOS, which can cause a key, file, or directory 
entry to appear in multiple SST files. Ideally, each unique key, file, or 
directory entry should reside in only one SST file, eliminating redundant 
storage and mitigating the multiplier effect caused by snapshots. If 
implemented correctly, the total RocksDB size would be proportional to the 
total number of unique keys in the system rather than the number of snapshots.

 


> Snapshot Defragmentation to reduce storage footprint
> ----------------------------------------------------
>
>                 Key: HDDS-13003
>                 URL: https://issues.apache.org/jira/browse/HDDS-13003
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: Ozone Manager
>            Reporter: Swaminathan Balachandran
>            Assignee: Swaminathan Balachandran
>            Priority: Major
>              Labels: pull-request-available
>
> In Apache Ozone, snapshots currently take a checkpoint of the Active Object 
> Store (AOS) RocksDB each time a snapshot is created and track the compaction 
> of SST files over time. This model works efficiently when snapshots are 
> short-lived, as they merely serve as hard links to the AOS RocksDB. However, 
> over time, if an older snapshot persists while significant churn occurs in 
> the AOS RocksDB (due to compactions and writes), the snapshot RocksDB may 
> diverge significantly from both the AOS RocksDB and other snapshot RocksDB 
> instances. This divergence increases storage requirements linearly with the 
> number of snapshots.
> The primary inefficiency in the current snapshotting mechanism stems from 
> constant RocksDB compactions in AOS, which can cause a key, file, or 
> directory entry to appear in multiple SST files. Ideally, each unique key, 
> file, or directory entry should reside in only one SST file, eliminating 
> redundant storage and mitigating the multiplier effect caused by snapshots. 
> If implemented correctly, the total RocksDB size would be proportional to the 
> total number of unique keys in the system rather than the number of snapshots.
> ----
> Note: *Snapshot Defragmentation* was previously called *Snapshot Compaction* 
> during development and in the design doc. It is renamed because the *Snapshot 
> Compaction* name can be easily confused with *[RocksDB 
> Compaction|https://github.com/facebook/rocksdb/wiki/Compaction]*, which is a 
> different concept.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-13003) Snapshot Defragmentation to reduce storage footprint

Reply via email to