On 10/7/07, Wayne Davison <[EMAIL PROTECTED]> wrote: > On Mon, Jan 08, 2007 at 10:16:01AM -0800, Wayne Davison wrote: > > And one final thought that occurred to me: it would also be possible > > for the sender to segment a really large file into several chunks, > > handling each one without overlap, all without the generator or the > > receiver knowing that it was happening. > > I have a patch that implements this: > > http://rsync.samba.org/ftp/unpacked/rsync/patches/segment_large_hash.diff
I like better performance, but I'm not entirely happy with a fixed upper limit on the distance that data can migrate and still be matched by the delta-transfer algorithm: if someone is copying an image of an entire hard disk and rearranges the partitions within the disk, rsync will needlessly retransmit all the partition data. An alternative would be to use several different block sizes spaced by a factor of 16 or so and have a separate hash table for each. Each hash table would hold checksums for a sliding window of 8/10*TABLESIZE blocks around the current position. This way, small blocks could be matched across small distances without overloading the hash table, and large blocks could still be matched across large distances. Matt -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html