Hi, Les Mikesell wrote on 2009-05-20 08:36:49 -0500 [Re: [BackupPC-users] A problem really hard to handle : BackupPc Crashing]: > SebClight wrote: > > As this being my first post, I'll maybe do it wrong :P
well, since you're asking, Backup Central seems to have managed to mess up your log file quote - the times are "really hard to handle". Also, your subject line should describe the problem, not the fact that you think it is hard to debug (or important or whatever). Aside from that, I've seen worse posts from some regular participants :). > > I have a BackupPc instance (3.1.0) on Debian Lenny. It randomly crashes > > but mainly during night hours. I have to restart the service once a day > > and sometimes more. I have more than 40 hosts to backup (most of them are > > just websites so they're not very big). > > > > See some samples of the logs : > > > > [...] > > 2009-05-18 20:37:39 Started incr backup on Website02 > > (pid=18496, share=Websites) > > 2009-05-18 20:46:08 Got signal PIPE... cleaning up This one was in the middle of a backup. > > [...] > > See ? It crashed at 29h46 right after the "Got signal PIPE... cleaning up" > > command. Well, at that time of day, I'd crash too ;-). > > [...] > > 2009-05-20 00:25:01 Started full backup on website05 > > (pid=15173, share=Websies) > > 2009-05-20 00:25:06 Finished full backup on website05 Possibly this one is completing rather fast because you've misspellt the share name? > > I think this is quite a challenging problem... I think it's quite an annoying problem, because it's almost certainly outside BackupPC (hardware, broken binaries, system configuration ...). > > As you can see too, backups are running quite normally. I can backup hosts > > or dirs manually on the web interface and backups seem to run normally as > > well but when it gets this "Got signal PIPE... cleaning up" signal, it > > crashes. Strictly speaking, it's not a crash. BackupPC terminates cleanly after encountering an unexpected situation. Whether this response is correct in this situation is a different matter. I'd argue for just closing the socket responsible, though I'm not sure how I'd implement that. It probably means making all output to sockets event-driven ... > > Any ideas ? > > Just a wild guess, but linux will kill process more or less at random if > you run out of ram and swap space - and rsync can use a lot of memory if > there are many files on the target. The PIPE signal just tells you > that a child process died when reading/writing to it, so that's not much > to go on. I believe it is actually only on writing to a socket closed on the other end that you get a SIGPIPE. Reading should simply give you an EOF condition. There are only two possibilities I can think of how this could happen: 1.) Someone sent a *command* to the server (see Main_Check_Client_Messages in BackupPC) and didn't wait for a response. This should not happen. Corrupt script or client dying unexpectedly, maybe due to the system running out of swap (in a very narrow time frame, though). 2.) Someone initiated a connection to the server and closed it (or died) right away, before the server had time to write a seed to the connection. This should also not happen. Note that this gives any unauthenticated attacker with the ability to open a connection to the BackupPC server a DOS attack vector (I don't think that is happening, I just think it needs to be fixed). I think it seems unlikely that an OOM condition should always lead to a child process being killed (at an unlikely point in time), never the BackupPC server itself or other important system processes. Moreover, it didn't sound like backups with many files on the target ("most of them are just websites so they're not very big"), and the share names don't sound like rsync (though it could be rsyncd). What I'd try: - 'apt-get install --reinstall backuppc' Make sure the package is re-downloaded (though a corrupt package should fail the MD5 checks). - alternatively, check the installed package for corruption: 'debsums -c backuppc' - likewise, check the Perl dependencies 'debsums -c perl perl-modules perl-base' (and possibly others I've missed) - test system memory - add debugging messages to the BackupPC daemon to see where the SIGPIPE is triggered; try to find out which script is causing it - move BackupPC_nightly to a different time (see WakeupSchedule in the config file) to see if the "crashes" correlate with BackupPC_nightly execution time Hope that helps. Regards, Holger ------------------------------------------------------------------------------ Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects _______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/