On Mon, 8 Jan 2007, Wayne Davison wrote:

On Mon, Jan 08, 2007 at 01:37:45AM -0600, Evan Harris wrote:

I've been playing with rsync and very large files approaching and
surpassing 100GB, and have found that rsync has excessively very poor
performance on these very large files, and the performance appears to
degrade the larger the file gets.

Yes, this is caused by the current hashing algorithm that the sender
uses to find matches for moved data.  The current hash table has a fixed
size of 65536 slots, and can get overloaded for really large files.
...

Would it make more sense just to make rsync pick a more sane blocksize for very large files? I say that without knowing how rsync selects the blocksize, but I'm assuming that if a 65k entry hash table is getting overloaded, it must be using something way too small. Should it be scaling the blocksize with a power-of-2 algorithm rather than the hash table (based on filesize)?

I know that may result in more network traffic as a bigger block containing a difference will be considered "changed" and need to be sent instead of smaller blocks, but in some circumstances wasting a little more network bandwidth may be wholly warranted. Then maybe the hash table size doesn't matter, since there are fewer blocks to check.

I haven't tested to see if that would work. Will -B accept a value of something large like 16meg? At my data rates, that's about a half a second of network bandwidth, and seems entirely reasonable.

Evan
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Reply via email to