Re: [gentoo-user] Backup program that compresses data but only changes new files.

Dale Mon, 15 Aug 2022 01:33:49 -0700

William Kenworthy wrote:
>
> On 15/8/22 06:44, Dale wrote:
>> Howdy,
>>
>> With my new fiber internet, my poor disks are getting a work out, and
>> also filling up.  First casualty, my backup disk.  I have one directory
>> that is . . . well . . . huge.  It's about 7TBs or so.  This is where it
>> is right now and it's still trying to pack in files.
>>
>>
>> /dev/mapper/8tb            7.3T  7.1T  201G  98% /mnt/8tb
>>
>>
>> Right now, I'm using rsync which doesn't compress files but does just
>> update things that have changed.  I'd like to find some way, software
>> but maybe there is already a tool I'm unaware of, to compress data and
>> work a lot like rsync otherwise.  I looked in app-backup and there is a
>> lot of options but not sure which fits best for what I want to do.
>> Again, backup a directory, compress and only update with changed or new
>> files.  Generally, it only adds files but sometimes a file gets replaced
>> as well.  Same name but different size.
>>
>> I was trying to go through the list in app-backup one by one but to be
>> honest, most links included only go to github or something and usually
>> doesn't tell anything about how it works or anything.  Basically, as far
>> as seeing if it does what I want, it's useless. It sort of reminds me of
>> quite a few USE flag descriptions.
>>
>> I plan to buy another hard drive pretty soon.  Next month is possible.
>> If there is nothing available that does what I want, is there a way to
>> use rsync and have it set to backup files starting with "a" through "k"
>> to one spot and then backup "l" through "z" to another?  I could then
>> split the files into two parts.  I use a script to do this now, if one
>> could call my little things scripts, so even a complicated command could
>> work, just may need help figuring out the command.
>>
>> Thoughts?  Ideas?
>>
>> Dale
>>
>> :-)  :-)
>>
> The questions you need to ask is how compressible is the data and how
> much duplication is in there.  Rsync's biggest disadvantage is it
> doesn't keep history, so if you need to restore something from last
> week you are SOL.  Honestly, rsync is not a backup program and should
> only be used the way you do for data that don't value as an rsync
> archive is a disaster waiting to happen from a backup point of view.
>
> Look into dirvish - uses hard links to keep files current but safe, is
> easy to restore (looks like a exact copy so you cp the files back if
> needed.  Downside is it hammers the hard disk and has no compression
> so its only deduplication via history (my backups stabilised about 2x
> original size for ~2yrs of history - though you can use something like
> btrfs which has filesystem level compression.
>
> My current program is borgbackup which is very sophisticated in how it
> stores data - its probably your best bet in fact.  I am storing
> literally tens of Tb of raw data on a 4Tb usb3 disk (going back years
> and yes, I do restore regularly, and not just for disasters but for
> space efficient long term storage I access only rarely.
>
> e.g.:
>
> A single host:
>
> ------------------------------------------------------------------------------
>
>                        Original size      Compressed size Deduplicated
> size
> All archives:                3.07 TB              1.96 TB           
> 151.80 GB
>
>                        Unique chunks         Total chunks
> Chunk index:                 1026085             22285913
>
>
> Then there is my offline storage - it backs up ~15 hosts (in repos
> like the above) + data storage like 22 years of email etc. Each host
> backs up to its own repo then the offline storage backs that up.  The
> deduplicated size is the actual on disk size ... compression varies as
> its whatever I used at the time the backup was taken ... currently I
> have it set to "auto,zstd,11" but it can be mixed in the same repo (a
> repo is a single backup set - you can nest repos which is what I do -
> so ~45Tb stored on a 4Tb offline disk).  One advantage of a system
> like this is chunked data rarely changes, so its only the differences
> that are backed up (read the borgbackup docs - interesting)
>
> ------------------------------------------------------------------------------
>
>                        Original size      Compressed size Deduplicated
> size
> All archives:               28.69 TB             28.69 TB             
> 3.81 TB
>
>                        Unique chunks         Total chunks
> Chunk index:
>
>
>
>



For the particular drive in question, it is 99.99% videos.  I don't want
to lose any quality but I'm not sure how much they can be compressed to
be honest.  It could be they are already as compressed as they can be
without losing resolution etc.  I've been lucky so far.  I don't think
I've ever needed anything and did a backup losing what I lost on working
copy.  Example.  I update a video only to find the newer copy is corrupt
and wanting the old one back.  I've done it a time or two but I tend to
find that before I do backups.  Still, it is a downside and something
I've thought about before.  I figure when it does happen, it will be
something hard to replace.  Just letting the devil have his day.  :-(

For that reason, I find the version type backups interesting.  It is a
safer method.  You can have a new file but also have a older file as
well just in case new file takes a bad turn.  It is a interesting
thought.  It's one not only I should consider but anyone really. 

As I posted in another reply, I found a 10TB drive that should be here
by the time I do a fresh set of backups.  This will give me more time to
consider things.  Have I said this before a while back???  :/ 

Dale

:-)  :-)

Re: [gentoo-user] Backup program that compresses data but only changes new files.

Reply via email to