On 1/24/25 3:03 AM, William Seligman wrote:
I'll start by acknowledging that is not a bug in backintime. If this is the wrong place to ask this question, please consider leaving a suggestion to where I should ask.

I've set up a Linux server to serve as a repository for remote backintime backups for several systems. I'm trying to create a report, for each system's backup, of the amount of disk space used for the latest snapshot (easy, I'll show the command below), and total disk space used for all the snapshots for that system.

To the disk space used for the latest snapshot (or any particular snapshot), there's no difficulty. For example:

/usr/bin/du -sx /pool/backup/backintime/mysystem.example.com/user/1/ last_snapshot/backup/*

The problem comes when I want to see the total disk use for `mysystem.example.com`. This command gives the correct answer, as far as I can tell:

/usr/bin/du -sx /pool/backup/backintime/mysystem.example.com/

The problem is that while the answer appears to be correct, as the number of snapshots increases, the du command takes longer and longer to execute. For some of the backups with a large number of files, it takes hours for that one du command to execute.

My guess is that this has something to do with how du is handling the hard links in order to get that correct answer. Based on my fiddling around, it appears that du is visiting every snapshot directory and going through all of files it finds, even if there are only a few files that differ between snapshots.

The result appears to be that if that first `du` command takes ten minutes due to the number of files in the snapshot, the second `du` command takes ten minutes * the number of snapshots.

Is my guess correct? Or is this due to something else? Is there any work-around?

The purpose of this report I'm creating is to understand how much actual disk space is being used over time for backintime backups. If I found that new snapshots were taking up a lot of space, it would mean the system's user was refreshing a lot of large files in-between snapshots. I'd want to re-evaluate the frequency of their backups or how long backintime retained the snapshots.

AlmaLinux 9.5
backintime 1.3.2

I obtained backtime via the EPEL repository:
dnf install backintime-qt

Disk has:

# df -h /pool
Filesystem             Size  Used Avail Use% Mounted on
/dev/mapper/POOL-pool   17T  5.1T   11T  33% /pool

# df -hi /pool
Filesystem            Inodes IUsed IFree IUse% Mounted on
/dev/mapper/POOL-pool   262M   58M  205M   22% /pool


_______________________________________________
Bit-dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/bit-dev.python.org/
Member address: [email protected]


Hi William,

Your guess is correct. du is visiting every file one by one, and looking at which inode it points to, and counts it towards the capacity if this is a new inode which it didn't count before.

The thing is, an hard-link is in fact indistinguishable from a real file, because a file is a set of inodes with a single hard link pointing to the beginning of these set of inodes. When you have more links, there's no problems. When the hard-link count is zero, the file is essentially deleted.

Symbolic links are links which point to any hard-link pointing to the set of inodes. They're indirections in a sense.

Since there's no way to distinguish between a "file" and a "hard-link" (which are the same thing in a sense), there's no workaround for this phenomenon, sorry. If you want to ease the burden on your system, you can get the du counts per host (folder in your case), and create the report that way. If your backend is an SSD, you can parallelize these commands to get most of your SSD and shorten the work a bit (unrelated tip: If the SSD is an external one, watch for temps. These things can get to throttling temperatures when driven hard).

If you feel like it, you can look into diffoscope[0], to compare your directories.

If you have any doubts, or I failed to convey it clearly, please answer this e-mail, I'll try my best to clarify this further.

Cheers,

Happy backuping,

Hakan

[0]: https://diffoscope.org/

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

_______________________________________________
Bit-dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/bit-dev.python.org/
Member address: [email protected]

Reply via email to