Am Mon, 6 Mar 2017 09:09:48 -0500
schrieb "Poison BL." <poiso...@gmail.com>:

> On Mon, Mar 6, 2017 at 2:23 AM, Kai Krakow <hurikha...@gmail.com>
> wrote:
> 
> > Am Tue, 14 Feb 2017 16:14:23 -0500
> > schrieb "Poison BL." <poiso...@gmail.com>:  
> > > I actually see both sides of it... as nice as it is to have a
> > > chance to recover the information from between the last backup
> > > and the death of the drive, the reduced chance of corrupt data
> > > from a silently failing (spinning) disk making it into backups is
> > > a bit of a good balancing point for me.  
> >
> > I've seen bordbackup giving me good protection to this. First, it
> > doesn't backup files which are already in the backup. So if data
> > silently changed, it won't make it into the backup. Second, it does
> > incremental backups. Even if something broke and made it into the
> > backup, you can eventually go back weeks or months to get back the
> > file. The algorithm is very efficient. And every incremental backup
> > is a full backup at the same time - so you thin out backup history
> > by deleting any backup at any time (so it's not like traditional
> > incremental backup which always needs the parent backup).
> >
> > OTOH, this means that every data block is only stored once. If
> > silent data corruption is hitting here, you loose the complete
> > history of this file (and maybe others using the same deduplicated
> > block).
> >
> > For the numbers, I'm storing my 1.7 TB system into a 3 TB disk
> > which is 2.2 TB full now. But the backup history is almost 1 year
> > now (daily backups).
> >
> > As a sort of protection against silent data corruption, you could
> > rsync borgbackup to a remote location. The differences are usually
> > small, so that should be a fast operation. Maybe to some cloud
> > storage or RAID protected NAS which can detect and correct silent
> > data corruption (like ZFS or btrfs based systems).
> >
> >
> > --
> > Regards,
> > Kai
> >
> > Replies to list-only preferred.
> >  
> 
> That's some impressive backup density... and I haven't looked into
> borgbackup, but it sounds like it runs on the same principles as the
> rsync+hardlink based scripts I've seen, though those will back up
> files that've silently changed, since the checksums won't match any
> more, but that won't blow away previous copies of the file either.
> I'll have to give it a try!

Borgbackup seems to check inodes to really fast get a listing of what
files changed. It only needs a few minutes to scan through millions of
files for me, rsync is way slower, and even "find" is slower I feel.
Taking a daily backup of takes usually 8-12 minutes for me (depending
on the delta), thinning the backup set from old backups takes another
1-2 minutes.

> As for protecting against the backup set itself getting silent
> corruption, an rsync to a remote location would help, but you would
> have to ensure it doesn't overwrite anything already there that
> may've changed, only create new.

Use timestamp check only in rsync, not contents check. This should work
for borgbackup as it is only creating newer files, never older.

> Also, making the initial clone would
> take ages, I suspect, since it would have to rebuild the hardlink set
> for everything (again, assuming that's the trick borgbackup's using).

No, that's not the trick. Stored files are stored as chunks. Chunks are
split based on a moving window checksumming algorithm to detect
duplicate file blocks. So, deduplication is not done at file level but
subfile level (block level with variable block sizes).

Additionally, those chunks can be compressed with lz4, gzip, and I
think xz (the latter being painfully slow of course).

> One of the best options is to house the base backup set itself on
> something like zfs or btrfs on a system with ecc ram, and maintain
> checksums of everything on the side (crc32 would likely suffice, but
> sha1's fast enough these days there's almost no excuse not to use
> it). It might be possible to task tripwire to keep tabs on that side
> of it, now that I consider it. While the filesystem itself in that
> case is trying its best to prevent issues, there's always that slim
> risk that there's a bug in the filesystem code itself that eats
> something, hence the added layer of paranoia. Also, with ZFS for the
> base data set,
> you gain in-place compression,

Is already done by borgbackup.

> dedup

Is also done by borgbackup.

> if you're feeling
> adventurous

You don't have to because you can use a more simple filesystem for
borgbackup. I'm storing on xfs and yet plan to sync to remote.

> (not really worth it unless you have multiple very
> similar backup sets for different systems), block level checksums,
> redundancy across physical disks, in place snapshots, and the ability
> to use zfs send/receive to do snapshot backups of the backup set
> itself.
> 
> I managed to corrupt some data with zfs (w/ dedup, on gentoo) shared
> out over nfs a while back on a box with way too little ram a while
> back (nothing important, throwaway VM images), hence the paranoia of
> secondary checksum auditing and still replicating the backup set for
> things that might be important.


-- 
Regards,
Kai

Replies to list-only preferred.


Reply via email to