Hi Y'all,

I'm seeing some interesting behavior that I was hoping someone could shed some light on. Basically I'm trying to rsync a lot of files, in a series of about 60 rsyncs, from one server to another. There are about 160 million files. I'm running 3 rsyncs concurrently to increase the speed, and as each one finishes, another starts, until all 60 are done.

The machine I'm initiating the rsyncs on has 48GB RAM. This is CentOS linux 5.4, kernel revision 2.6.18-164.15.1.el5. Rsync version 3.0.5 (on both sides).

I was able to rsync all the data over to the new machine. But, because there was so much data, I need to run the rsyncs again to catch data that changed during the last rsync run. It sort of hangs midway through.

What happens is that as the rsyncs run, the memory usage on the machine slowly creeps up, using quite a bit of RAM, which is odd because I thought the rsyncs were counting files incrementally, to reduce RAM impact. But, looking at top, the rsync processes aren't using much RAM at all:

top - 12:22:10 up 1 day, 27 min,  1 user,  load average: 46.85, 46.37, 44.97
Tasks: 309 total,   8 running, 301 sleeping,   0 stopped,   0 zombie
Cpu(s): 1.0%us, 13.8%sy, 0.0%ni, 84.9%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Mem:  49435196k total, 34842524k used, 14592672k free,   141748k buffers
Swap: 10241428k total,        0k used, 10241428k free,    49428k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7351 root 25 0 19892 9.8m 844 R 100.1 0.0 552:58.55 rsync 9084 root 16 0 13108 2904 820 R 100.1 0.0 299:24.59 rsync 4759 root 0 -20 1447m 94m 15m S 29.9 0.2 667:34.21 mmfsd 9539 root 16 0 30136 19m 820 R 6.3 0.0 6:29.28 rsync 9540 root 15 0 271m 46m 260 S 0.3 0.1 0:12.13 rsync 10047 root 15 0 10992 1212 768 R 0.3 0.0 0:00.01 top
    1 root      15   0 10348  700  592 S  0.0  0.0   0:02.15 init
...etc...

But nevertheless, 34GB RAM is in use. But what really kills things is that at some point, each rsync all of a sudden ramps up to 100% CPU usage, and the all activity for that rsync essentially stops. In the above example, 2 of the 3 rsyncs are in that 100% CPU state, while the third rsync is only at 6.3%, but that is the one actually doing something. In some cases all 3 rsyncs get to 100%, and they all stall, there is no network traffic on the NIC at all and they don't progress.

Now mostly what they are doing is counting files, since most of the files are the same on both sides, but there are just so many files (160 million). I don't seem to be out of memory, but I don't know why rsync would go to 100% CPU and just stall.

I am rsyncing from an rsync server to my local server, with commands similar to this:

rsync -a --delete rsync://encodek-0-4/data/genomes/ /hive/data/genomes/

Again, both sides at version 3.0.5. Nothing fancy or special. I have confirmed that it does count the files incrementally by running a few manually, it does report "getting incremental file list...".

Any ideas why the processes go to 100% CPU and then stall? I should also note that the initial run of rsyncs, where it was actually copying a ton of data, did not seem to have this problem, but now that the data is there and I'm rsyncing again, it seems to have this problem. Is it somehow related to the fact that it is mostly comparing a ton of files very quickly but not actually copying many of them?

Thanks for any ideas!

-erich
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Reply via email to