On Sat, Mar 12, 2011 at 5:07 AM, Jeffrey J. Kosowsky <backu...@kosowsky.org> wrote: > In particular with regard to metrics you seek, I don't know whether it is > better/worse to have one file with 2N links or N files with 2 links. Your > metrics don't distinguish that and depending on how the list of hard links is > constructed that may or may not be a big difference. Specifically, in the 1st > case, does the link list still have O(N) entries or just O(1) entries -- huge > difference potentially. > > More generally, I'm really wondering whether perhaps rsync could be > patched/modified to work better in edge cases like
You guys definitely working at a deeper level than me 8-) I'm not seeking a formula that will be able to predict how well rsync will handle a given TOPDIR, but just a set of data points we can collect from BPC users when discussing this issue. So when a given BPC user says "rsync is working fine for me to clone my whole filesystem, and I've got a 'really big' TOPDIR", then I'm proposing we have a standard set of questions to allow us to get the relevant facts, so we can discuss the issues more meaningfully. Here's what I've got so far (assuming TOPDIR is in the standard spot): 1. Exactly what version of rsync - singular if local copy, on both ends if client/server. 2. Total number of inodes - "df -i /var/lib/backuppc/" 3. Total number of files that have more than one hard link - "find /var/lib/backuppc/ -type f -links +1" 4. Total physical RAM in the machine 5. "memfree" stats from running "free -m", before running rsync and say 10 minutes into running the job. Perhaps once we've collected enough of the above profiles, this may well lead in the future to more or less rough guidelines/predictions regarding rsync, e.g. how much RAM you'd need. The idea of improving or assisting rsync to handle these "edge case" needs is far beyond my own scope, and I'd personally just use one of the workarounds I mentioned previously if it looked like rsync was struggling. So what I'm looking for here (not only from you but anyone who can help), is: A. confirmation that each of the example commands above are good ones to collect the given data point, or suggestions for better ones, and B. confirmation/suggestion on the data points themselves - are any completely irrelevant, or should any be added? For example, I believe total disk space used by TOPDIR is basically irrelevant, correct? ------------------------------------------------------------------------------ Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d _______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/