[BackupPC-users] Handling machines too large to back themselves up

Dave Sherohman Thu, 08 Apr 2021 06:25:10 -0700

I have a server which I'm not able to back up because, apparently, it'sjust too big.

If you remember me asking about synology's weird rsync a couple weeksago, it's that machine again. We finally solved the rsync issues byditching the synology rync entirely and installing one built fromstandard rsync source code and using that instead. Using that, we wereable to get one "full" backup, but it missed a bunch of files because weforgot to use sudo when we did it. (The synology rsync is set up to runsuid root and is hardcoded to not allow root to run it, so we had totake sudo out for that, then forgot to add it back in when we switchedto standard rsync.)

Since then, every attempted backup has failed, either full orincremental, because the synology is running out of memory:


This is the rsync child about to exec /usr/libexec/backuppc-rsync/rsync_bpc
Xfer PIDs are now 1228998,1229014
xferPids 1228998,1229014
ERROR: out of memory in receive_sums [sender]
rsync error: error allocating core memory buffers (code 22) at util2.c(118) 
[sender=3.2.0dev]
Done: 0 errors, 0 filesExist, 0 sizeExist, 0 sizeExistComp, 0 filesTotal, 0 
sizeTotal, 0 filesNew, 0 sizeNew, 0 sizeNewComp, 32863617 inode
rsync_bpc: [generator] write error: Broken pipe (32)

The poor little NAS has only 6G of RAM vs. 9.4 TB of files (configuredas two sharenames, /volume1 (8.5T) and /volume2 (885G) and doesn't seemup to the task of updating that much at once via rsync.

Adding insult to injury, even a failed attempt to back it up causes thebpc server to take 45 minutes to copy the directory structure from theprevious backup before it even attempts to connect, and then 12-14 hoursdoing reference counts after it finishes backing up nothing. Whichmakes trial-and-error painfully slow, since we can only try one thing,at most, each day.

In our last attempt, I tried flipping the order of the RsyncShareNamesto do /volume2 first, thinking it might successfully back up the smallershare successfully before running out of memory trying to process thelarger one. It did not run out of memory... but it did sit there for afull 24 hours with one CPU (out of four) running pegged at 99% handlingthe rsync process before we finally put it out of its misery. The bpcxferlog recorded that the connection was closed unexpectedly (which isfair, since we killed the other end) after 3182 bytes were received, sothe client clearly hadn't started sending data yet. And now, after thatattempt, the bpc server still lists the status as "refCnt #2" another 24hours after the client-side rsync was killed.

So, aside from adding RAM, is there anything else we can do to try towork around this? Would it be possible to break this one backup downinto smaller chunks that are still recognized as a single host (so theyrun in sequence and don't get scheduled concurrently), but don't requirethe client to diff large amounts of data in one go, and maybe also speedup the reference counting a bit?

An "optimization" (or at least an option) to completely skip thereference count updates after a backup fails with zero files received(and, therefore, no new/changed references to worry about) might alsonot be a bad idea.

_______________________________________________
BackupPC-users mailing list
[email protected]
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    https://github.com/backuppc/backuppc/wiki
Project: https://backuppc.github.io/backuppc/

[BackupPC-users] Handling machines too large to back themselves up

Reply via email to