I have a server which I'm not able to back up because, apparently, it's
just too big.
If you remember me asking about synology's weird rsync a couple weeks
ago, it's that machine again. We finally solved the rsync issues by
ditching the synology rync entirely and installing one built from
standard rsync source code and using that instead. Using that, we were
able to get one "full" backup, but it missed a bunch of files because we
forgot to use sudo when we did it. (The synology rsync is set up to run
suid root and is hardcoded to not allow root to run it, so we had to
take sudo out for that, then forgot to add it back in when we switched
to standard rsync.)
Since then, every attempted backup has failed, either full or
incremental, because the synology is running out of memory:
This is the rsync child about to exec /usr/libexec/backuppc-rsync/rsync_bpc
Xfer PIDs are now 1228998,1229014
xferPids 1228998,1229014
ERROR: out of memory in receive_sums [sender]
rsync error: error allocating core memory buffers (code 22) at util2.c(118)
[sender=3.2.0dev]
Done: 0 errors, 0 filesExist, 0 sizeExist, 0 sizeExistComp, 0 filesTotal, 0
sizeTotal, 0 filesNew, 0 sizeNew, 0 sizeNewComp, 32863617 inode
rsync_bpc: [generator] write error: Broken pipe (32)
The poor little NAS has only 6G of RAM vs. 9.4 TB of files (configured
as two sharenames, /volume1 (8.5T) and /volume2 (885G) and doesn't seem
up to the task of updating that much at once via rsync.
Adding insult to injury, even a failed attempt to back it up causes the
bpc server to take 45 minutes to copy the directory structure from the
previous backup before it even attempts to connect, and then 12-14 hours
doing reference counts after it finishes backing up nothing. Which
makes trial-and-error painfully slow, since we can only try one thing,
at most, each day.
In our last attempt, I tried flipping the order of the RsyncShareNames
to do /volume2 first, thinking it might successfully back up the smaller
share successfully before running out of memory trying to process the
larger one. It did not run out of memory... but it did sit there for a
full 24 hours with one CPU (out of four) running pegged at 99% handling
the rsync process before we finally put it out of its misery. The bpc
xferlog recorded that the connection was closed unexpectedly (which is
fair, since we killed the other end) after 3182 bytes were received, so
the client clearly hadn't started sending data yet. And now, after that
attempt, the bpc server still lists the status as "refCnt #2" another 24
hours after the client-side rsync was killed.
So, aside from adding RAM, is there anything else we can do to try to
work around this? Would it be possible to break this one backup down
into smaller chunks that are still recognized as a single host (so they
run in sequence and don't get scheduled concurrently), but don't require
the client to diff large amounts of data in one go, and maybe also speed
up the reference counting a bit?
An "optimization" (or at least an option) to completely skip the
reference count updates after a backup fails with zero files received
(and, therefore, no new/changed references to worry about) might also
not be a bad idea.
_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: https://github.com/backuppc/backuppc/wiki
Project: https://backuppc.github.io/backuppc/