Hi, Jim Leonard wrote on 2009-08-18 17:00:05 -0500 [[BackupPC-users] BackupPC File::RsyncP issues]: > First off, I'm a happy user of BackupPC; I'm only posting because I have > an architecture question resulting in bad performance that I'm hoping > someone can answer. > [...] > With smb, which used smbclient to do the transfers, I was seeing > transfer speeds of 40-65MB/s over a gigabit network -- with rsync-based > backups, I am seeing about 6MB/s, ten times slower.
first of all, where are you seeing these figures, and what are you measuring? The primary purpose of the rsync protocol is to save network bandwidth. So if, for example, you are transferring only one tenth the amount of data for a full backup, and that takes the same time as with SMB, your network throughput will be only one tenth as high. That is not a problem, but rather a feature, and it indicates that network bandwidth is not, in fact, your bottleneck. There are other good reasons to use rsync just the same. And, yes, I read your mail in the other thread, but it's still not obvious what you are actually observing, and what you are interpreting. Secondly, what are you comparing? Due to a "feature" of the interpretation of attrib files by the rsync XferMethod, the first backup (well, all up to the first full, to be a bit more exact) after switching from non-rsync to rsync will re-transfer all data (which would make the backup slow, but not low-bandwidth). In any case, you should run at least one full rsync backup (per host) before starting measurements. Have you got very large growing files (or probably: large *changing* files) in your backup? They could also lead to an explanation (outside File::RsyncP, by the way). > I profiled File::RsyncP which is what BackupPC_dump appears to be using, > and found this troubling report after a profile time of one day: > > time elapsed (wall): 86034.3727 > time running program: 85959.5328 (99.91%) > time profiling (est.): 74.7665 (0.09%) > > %Time Sec. #calls sec/call F name > 83.30 71605.7838 913708 0.078368 ? File::RsyncP::pollChild > 15.98 13737.1191 261 52.632640 File::RsyncP::writeFlush > 0.21 176.3028 121432 0.001452 File::RsyncP::getData > (snip) > > As you can see, pollChild is called a ridiculously large number of > times, which is eating up nearly 70% of the CPU time trying to do a > backup. Did you look at the code, or are you inferring that the number is ridiculous from the name of the function? I don't know enough about the rsync protocol (yet) to say for sure if the number of calls could be reduced and how, but the calls to pollChild() seem to make sense to me. What strikes *me* as unreasonable is the 261 calls to writeFlush() taking an average of 52.6 seconds. Or maybe there was a wrap-around in the counter? You should also note that not all of the work is done inside File::RsyncP, so it's not 70% of the backup time spent there. Don't get me wrong. I'm not saying that it wouldn't be good to significantly increase BackupPC performance, if it can be done in the context of how BackupPC works or can work. > This is extremely inefficient and completely explains why my > backups are taking so long over rsync Does it? Please share the explanation ... > So, my questions are: > > - Is there a reason BackupPC needs to emulate rsync through File::RsyncP > instead of just using rsync itself? Yes. Craig wouldn't have gone to the trouble of implementing File::RsyncP for BackupPC if there wasn't, would he? (You are aware that Craig is also the author of BackupPC, aren't you? ;-) How would you propose using rsync to update a compressed deduplicated pool with a separate directory for each backup, mangled file names and file attributes stored seperately? > - If not, is anyone maintaining File::RsyncP who can optimize that code > and/or redesign it? If there is no reason to use it, someone should optimize it? ;-) I believe Craig is researching other alternatives (a fuse FS to handle compression and deduplication, so BackupPC could, in fact, use native rsync). If that proves unviable, upgrading File::RsyncP to protocol version 30 would probably be next. But File::RsyncP is open source, so you're free to optimize it yourself :-). If I find any time at all, I'll take a closer look at the matter, but that's pretty much an "if (0)" ... Regards, Holger ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ BackupPC-users mailing list [email protected] List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
