Robin Lee Powell wrote: > On Tue, Dec 15, 2009 at 02:33:06PM +0100, Holger Parplies wrote: >> Robin Lee Powell wrote on 2009-12-15 00:22:41 -0800: >>> Oh, I agree; in an ideal world, it wouldn't be an issue. I'm >>> afraid I don't live there. :) >> none of us do, but you're having problems. We aren't. > > How many of you are backing up trees as large as I am? So far, > everyone who has commented on the matter has said it's not even > close.
I think most other people broke up their large runs on directory boundaries already. I sort-of recall someone posting a script to do it dynamically as things changed some time ago. >> The suggestion that your *software* is probably misconfigured in >> addition to the *hardware* being flakey makes a lot of sense to >> me. > > Certainly possible, but if it is I genuinely have no idea where the > misconfiguration might be. Also note that only the incrementals > seem to fail; the initial fulls ran Just Fine (tm). One of them > took 31 hours. You did mention firewalls in the path, I think. Is there any possibility that the incremental directory scan takes so long before finding a change that the firewall times out the connection because there is no activity? If that is happening, turning on ssh keepalives might help. > > read(0, "", 8184) = 0 > select(2, NULL, [1], [1], {60, 0}) = 1 (out [1], left {60, 0}) > write(1, "K\0\0\10rsync: connection unexpectedly closed (179 bytes received > so far) [sender]\n", 79) = -1 EPIPE (Broken pipe) > --- SIGPIPE (Broken pipe) @ 0 (0) --- This looks like it thinks the other side closed. > The really fun part is that the date when the strace exited (was > doing "strace -p NUM ; date") is 6 hours before the BackupPC server > claims that the backup aborted. My ClientTimeout is set to 72000; > both backups aborted significantly *after* the twenty hour mark. > It's not relevant anyways, though; the connection was clearly > broken on the client end long before BackupPC timed out. Seems reasonable if the intermediate firewall broke the connection with a RST to the client and silently dropping thngs toward the server. > I'm totally willing to accept that the problem might be hardware or > software config on my end, but: > > 1. It seems to only happen with incrementals. That points to long silent periods as a possibility. What happens if you force a full instead of an incremental? It should be somewhat slower because the client must read everything but the checksum xfer activity may keep the connection alive. > 2. I have no idea even where to look; everything looks fine at a > system level as I understand it. I don't have the networking skill > to debug the networking end (the two machines are seperate RFC 1918 > address ranges, with a load balancer/firewall associated with each > (2 total) between them, plus a bunch of switches and so on). tcpdump host otherhost and port 22 (where other host is the server name or IP on the client, the client on the server) would show you the ssh stream packets. The only interesting part is how it ends (a packet with the RST flag or just nothing for a very long time). If you have access to the firewall configs, you might also look at how long idle connections are permitted to remain open. > Given that, it seems completely bizarre to me that you all are, I > dunno, morally offended? that I proposed increases BackupPC's > resilience to transient errors as a solution. No, we are just being pragmatic. Even if such a change can be made, it isn't going to happen overnight - perhaps not for years. So, from this side it seems bizarre that you aren't doing the practical things to work around your problem. The most obvious thing might be to move the server so it is on the same network as the clients... Or, breaking on directory boundaries or trying the hack I suggested to remove the --ignore-times option on fulls to make them fast enough to be practical all the time, giving you the saved partials you need. > 'm certainly not interested in maintaining my own patches or fork. > I'd like to think that if I made my idea run-time optional y'all > would roll it in, but the response has been so negative I'm worried. > Also, it's a lot of work. -_- My change would involve commenting out a couple of lines. Then you'd have to put them back if you ever want the checksum compare to verify the backup copy. -- Les Mikesell lesmikes...@gmail.com ------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/