On 20/7/19 07:08, David Koski wrote:
On 7/16/19 4:27 PM, Adam Goryachev wrote:
On 17/7/19 4:22 am, David Koski wrote:
Regards,
David Koski
dko...@sutinen.com
On 7/8/19 6:16 PM, Adam Goryachev wrote:
On 9/7/19 10:23 am, David Koski wrote:
I am trying to back up about 24TB of data that has millions of
files. It takes a day or to before it starts backing up and then
stops with an error. I did a CLI dump and trapped the output and
can see the error message:
Can't write 32780 bytes to socket
Read EOF: Connection reset by peer
Tried again: got 0 bytes
finish: removing in-process file
Shares/Archives/<path-removed>/COR_2630.png
Child is aborting
Done: 589666 files, 1667429241846 bytes
Got fatal error during xfer (aborted by signal=PIPE)
Backup aborted by user signal
Not saving this as a partial backup since it has fewer files than
the prior one (got 589666 and 589666 files versus 4225016)
dump failed: aborted by signal=PIPE
This backup is doing rsync over ssh. I enabled SSH keepalive but
it does not appear to be due to an idle network. It does not
appear to be a random network interruption because the time it
takes to fail is pretty consistent, about three days. I'm stumped.
Did you check:
$Conf{ClientTimeout} = 72000;
Also, what version of rsync on the client, what version of BackupPC
on the server, etc?
I think BPC v4 handles this scenario significantly better, in fact
a server I used to have trouble with on BPC3.x all the time has
since been combined with 4 other server (so 4 x the number of files
and total size of data) and BPC4 handles it easily.
Thank you all for your input. More information:
rsync version on client: 3.0.8 (Windows)
rsync version on server: 3.1.2 (Debian)
BackupPC version: 3.3.1
$(Config{ClientTimeout} = 604800
I just compared the output of two verbose BackupPC_dump runs and it
looks like the files are reported to be backed up even though they
are not. For example, this appears in logs of both backup runs:
create 644 4616/545 1085243184
<path-removed>/<name-removed>3412.zip
I checked and the file time stamp is year 2018. The log files are
full of these. I checked the real time clock on both systems and
they are correct. There are also files that have been backed up
that are not in the logs.
I suspect there are over ten million files but I don't have a good
way of telling now. Oddly, there are about 500,000 files backed
according to the log captured from BackupPC_dump and almost the same
number actually backed up and found in pc/<host>/0, but they are
different subsets of files. I have been tracking memory and swap
usage on the server and see no issues.
Is this a possible bug in BackupPC 3.3.1?
Please don't top-post if you can avoid it, at least not on mailing
lists.
I just realised:
Read EOF: Connection reset by peer
This is a networking issue, not BackupPC. In other words, something
has broken the network connection (in the middle of transferring a
file, so I would presume it isn't due to some idle timeout, dropped
NAT entry, etc). BackupPC has been told by the operating system that
the connection is no longer valid, and so it has "cleaned up" by
removing the in-progress file (partial).
I just completed another backup cycle that failed in the same manner
but this time with a continuous ping with captured output. It didn't
miss a beat.
A "continuous ping" doesn't prove a lack of a network connection issue.
You would need to record a complete wireshark copy of the network
interface, that would then tell you which machine "broke" the
connection. Either way, see below, it could be windows that is causing
the problem rather than your network.
It takes a day to start (presumably reading ALL the files on the
client takes this long, you could improve disk performance, or
increase RAM on the client to improve this).
You might be right. But it's not a show stopper.
"and then stops with an error" - is that on the first file, or are
some files successfully transferred? Is that the first large file?
Does it always fail on the same file (seems not, since it previously
got many more).
Good points. Confirmed: Not the first file (over 600,000 files
transferred first), not a large file (less than 20Meg), does not
always fail on the same file or directory.
I'm thinking you need to check and/or improve network reliability,
make sure both client and server are not running out of RAM/etc
(mainly the backuppc client, the OOM might kill the rsync process),
etc. Check your system logs on both client and server, and/or watch
top output on both systems during the backup.
The network did not miss a beat and generally appears responsive. It
has been checked. The client and server RAM usage are tracked in
Zabbix and not close to running out. Only curious thing is swap is
running out on the client (Windows Server 2016) even with 10GB RAM
available, but still has about 2GB before crash. Server system logs
(kern.log, syslog) show no signs of issues.
Could be related, BPC 3.x requires the client to traverse the entire
directory structure and store a complete listing of all files and
attributes. BPC 4.x plus a recent version of rsync on windows removes
this requirement.
Also, I recall a recent report on the mailing list about the way windows
handles some sort of network traffic timeouts etc, and this would
regularly cause these broken connection reports. You should check
through at least the last 2 or 3 months of posts, it had some detailed
analysis of the root cause of the problem, and I think a work-around to
resolve it. Sorry I don't have all the details or a complete memory.
Try backing up other systems, try backing up a smaller subset
(exclude some large directories, and then add them back in if you
complete a backup successfully).
That is a good idea. I'll try adding incrementally to the data backed
up.
Overall, I would advise to upgrade to BPC v4.x, it handles backups of
systems with huge number of files much better.
If incrementally adding doesn't solve the problem I'll try an upgrade.
Personally, my current opinion is that BPC 4.x is a better product than
BPC 3.x and you should prefer an upgrade anyway, but that's just my opinion.
Regards,
Adam
_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/