Gary,

Sorry about the delay in reply. I have a few moments now.

Do the files you want to back up exist on different hosts, or only on the
one server? It sounds like they're only on the one server, but please let
me know which it is.

Phil said:
But if I were you, if this rsync backup set contains *important* files
that no longer exist "in production", so to speak, I would sort out
those critical files into a distinct designated archive of their own and
add *that archive* to your backup set.

And I agree. However, that sorting process need not be manual. Consider
using dupe guru. It is open source software that compares two datasets
(even if folder structure isn't symmetrical) and finds duplicate files. You
will need a GUI - it cannot run in CLI alone.

The general process will be as follows:
1. Double check the setting since dupeguru to ensure it will be hashing
every file you intend to compare.
2. On the main tab for dupeguru, click to compare by contents. In the
bottom pane, add two paths: the path to the root folder for your live files
and the path to the root folder for your rsync backups. For the live file
path, on the right say that it is a "reference" dataset. Dupeguru wont take
any action against a reference dataset.
3. Double check, everything, then click compare. Wait for the scans to
finish. No action will be taken until you make a decision in the software.
Expect this to take some time.
4. Look at the comparison results in the right tab. It should show you
filenames and full paths for original and duplicate files. Examine the
results for sanity, and then take action. I recommend hiding all reference
files, re-checking to ensure that it now only shows duplicates, and they
are all in your rsync backup folder. In dupeguru, select the option to move
the detected duplicate files to another location on your system that is not
among the live files or the rsync files.
5. In your file management tools (dolphin, Nautilus, mc, ls, whatever),
examine the rsync backup folder. Is it now much smaller with many fewer
files? These are the deltas, the files that were different from those seen
in the live files.
6. Presumably a lot of duplicate files have been moved somewhere else.
Also, hopefully the space needs of the rsync files have been greatly
reduced. At this point, if you're satisfied with the results, you can
delete the duplicate files.

Be careful. Double and triple check everything. Read the manual. Here be
dragons. Etc.

https://dupeguru.voltaicideas.net


Robert Gerber
402-237-8692
[email protected]

On Sat, Aug 23, 2025, 5:00 PM Gary Dale <[email protected]> wrote:

> On 2025-08-23 15:29, Rob Gerber wrote:
> > I don't have a lot of time right now, but my main question is "Given
> > enough time and effort, you almost certainly could do this, but should
> > you?"
> >
> > I don't mean to be a downer, but are you sure screwing around with
> > bacula and "faking" an initial backup condition is worth the risk that
> > you get something wrong and you've "tricked" bacula into thinking
> > things are ok, when they actually aren't and your backups are invalid?
> >
> > Got to run, sorry for lack of details.
> >
> Conversely, wouldn't I be able to find out if it worked fairly quickly?
> The initial backup would be the "fake" so if I can't restore a file from
> it after reverting to the real setup, then I'd know it didn't work. The
> second test would come the next day, if the backup didn't duplicate
> files already in the original and if if I could still restore files.
>
> Your cautions are well taken but I've got a lot of files and trying to
> sift through them to find ones I may need in the future is a gargantuan
> task, while keeping a duplicate set of files is a large (2.7T)  waste of
> space.
>
>
_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to