On 9/1/25 14:57, Karl Vogel wrote:
On Mon 01 Sep 2025 at 16:15:39 (-0400), David Christensen wrote:
a. Set the ZFS backup file system property "dedup". This will enable
block-level de-duplication, which can de-duplicate data more than hard
links alone.
This option eats RAM like candy, so make sure you have plenty.
From what I have seen on FreeBSD ZFS, under load ZFS can consume as
much memory as it needs. For storage servers, this is exactly what I
want -- I paid for that memory, I want ZFS to use it. But, I have
little experience with ZFS on workstations; where many processes are
competing for memory. AIUI there are tunables for ZFS, so you have options.
b. Set the ZFS backup file system property "compression".
If you have large backup files, you can save more space by using "gzip"
for compression. On my backup box, this is for highly-compressible data
like large (1-3Gb) text-formatted logs:
Method Best Compression Ratio
-------------------------------
gzip 8.07x
lz4 5.83x
"gzip" takes slightly longer to store a big file, but I don't notice
any real delays when reading it. And I'm not patient.
I agree that it is possible to choose an optimum compression algorithm
for specific data, but that implies grouping the data according to
compression algorithm.
I already have a few top-level ZFS file systems that could benefit from
this optimization -- archives, backup, cvs, images, ghost, samba, and
virtualbox. I will definitely consider it (and some other ideas) the
next time I rebuild.
3. zfs-diff(8) -- for example, to determine the backed up directories
and files whose metadata and/or data have changed between two snapshots:
https://bezoar.org/src/zfs-snapshots/ describes using this for faster
incremental backups, even on spinning rust.
If I am understanding the article correctly, the author wrote a script
to ZFS diff a ZFS file system against its last snapshot and to copy the
changed files to another filesystem (?). I can see how this could be
useful if the author uses zfs-auto-snapshot(8) to take daily snapshots
and he wants to save modified files more frequently on demand, but I
think I would write a script that runs zfs-auto-snapshot(8) on demand
and encodes the current date-time in the snapshot name. But, the
author's approach makes it easy to see what changed, while my approach
would require another script to list only those files that changed.
TIMTOWTDI.
David