Re: Extremely poor rsync performance on very large files (near 100GB and larger)

Evan Harris Mon, 08 Jan 2007 15:31:18 -0800


On Mon, 8 Jan 2007, Wayne Davison wrote:

On Mon, Jan 08, 2007 at 01:37:45AM -0600, Evan Harris wrote:

I've been playing with rsync and very large files approaching and
surpassing 100GB, and have found that rsync has excessively very poor
performance on these very large files, and the performance appears to
degrade the larger the file gets.


Yes, this is caused by the current hashing algorithm that the sender
uses to find matches for moved data.  The current hash table has a fixed
size of 65536 slots, and can get overloaded for really large files.
...

Would it make more sense just to make rsync pick a more sane blocksize forvery large files? I say that without knowing how rsync selects theblocksize, but I'm assuming that if a 65k entry hash table is gettingoverloaded, it must be using something way too small. Should it be scalingthe blocksize with a power-of-2 algorithm rather than the hash table (basedon filesize)?

I know that may result in more network traffic as a bigger block containinga difference will be considered "changed" and need to be sent instead ofsmaller blocks, but in some circumstances wasting a little more networkbandwidth may be wholly warranted. Then maybe the hash table size doesn'tmatter, since there are fewer blocks to check.

I haven't tested to see if that would work. Will -B accept a value ofsomething large like 16meg? At my data rates, that's about a half a secondof network bandwidth, and seems entirely reasonable.


Evan
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Extremely poor rsync performance on very large files (near 100GB and larger)

Reply via email to