Evan Harris wrote: > Would it make more sense just to make rsync pick a more sane blocksize > for very large files? I say that without knowing how rsync selects > the blocksize, but I'm assuming that if a 65k entry hash table is > getting overloaded, it must be using something way too small. rsync picks a block size that is the square root of the file size. As I didn't write this code, I can safely say that it seems like a very good compromise between too small block sizes (too many hash lookups) and too large blocksizes (decreased chance of finding matches). > Should it be scaling the blocksize with a power-of-2 algorithm rather > than the hash table (based on filesize)? If Wayne intends to make the hash size a power of 2, maybe selecting block sizes that are smaller will make sense. We'll see how 3.0 comes along. > I haven't tested to see if that would work. Will -B accept a value of > something large like 16meg? It should. That's about 10 times the block size you need in order to not overflow the hash table, though, so a block size of 2MB would seem more appropriate to me for a file size of 100GB. > At my data rates, that's about a half a second of network bandwidth, > and seems entirely reasonable. > Evan I would just like to note that since I submitted the "large hash table" patch, I have seen no feedback on anyone actually testing it. If you can compile a patched rsync and report how it goes, that would be very valuable to me.
Shachar -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html