Re: Problem with checksum failing on large files

2002-10-14 Thread Donovan Baarda
On Mon, Oct 14, 2002 at 12:36:36AM -0700, Craig Barratt wrote: craig My theory is that this is expected behavior given the check sum size. [...] But it just occurred to me that checksum collisions is only part of the story. Things are really a lot worse. Let's assume that the block

Re: Problem with checksum failing on large files

2002-10-14 Thread jw schultz
On Mon, Oct 14, 2002 at 10:45:44PM +1000, Donovan Baarda wrote: In conclusion, a blocksize of 700 with the current 48bit signature blocksum has an unacceptable failure rate (5%) for any file larger than 100M, unless the file being synced is almost identical. Increasing the blocksize will

Re: Problem with checksum failing on large files

2002-10-14 Thread jw schultz
On Sun, Oct 13, 2002 at 12:32:42PM +1000, Donovan Baarda wrote: On Sat, Oct 12, 2002 at 11:13:50AM -0700, Derek Simkowiak wrote: My theory is that this is expected behavior given the check sum size. Craig, Excellent analysis! I was a bit concerned about his maths at first,

RE: Problem with checksum failing on large files

2002-10-14 Thread Terry Reed
-Original Message- From: Derek Simkowiak [mailto:[EMAIL PROTECTED]] Sent: Saturday, October 12, 2002 2:14 PM To: Craig Barratt Cc: Terry Reed; Donovan Baarda; '[EMAIL PROTECTED]' Subject: Re: Problem with checksum failing on large files My theory is that this is expected

Re: Problem with checksum failing on large files

2002-10-14 Thread Donovan Baarda
On Mon, Oct 14, 2002 at 06:22:36AM -0700, jw schultz wrote: On Mon, Oct 14, 2002 at 10:45:44PM +1000, Donovan Baarda wrote: [...] Does the first pass signature block checksum really only use 2 bytes of the md4sum? That seems pretty damn small to me. For 100M~1G you need at least 56bits,

Re: Problem with checksum failing on large files

2002-10-14 Thread Craig Barratt
I tried --block-size=4096 -c --block-size=4096 on 2 files (2.35 GB 2.71 GB) still had the same problem - rsync still needed to do a second pass to successfully complete. These tests were between Solaris client AIX server (both running rsync 2.5.5). Yes, for 2.35GB there is a 92% chance,

RE: Problem with checksum failing on large files

2002-10-14 Thread Terry Reed
Would you mind trying the following? Build a new rsync (on both sides, of course) with the initial csum_length set to, say 4, instead of 2? You will need to change it in two places in checksum.c; an untested patch is below. Note that this test version is not compatible with standard

Re: Problem with checksum failing on large files

2002-10-14 Thread Craig Barratt
Would you mind trying the following? Build a new rsync (on both sides, of course) with the initial csum_length set to, say 4, instead of 2? You will need to change it in two places in checksum.c; an untested patch is below. Note that this test version is not compatible with standard

Re: Problem with checksum failing on large files

2002-10-14 Thread jw schultz
On Tue, Oct 15, 2002 at 02:25:00AM +1000, Donovan Baarda wrote: On Mon, Oct 14, 2002 at 06:22:36AM -0700, jw schultz wrote: On Mon, Oct 14, 2002 at 10:45:44PM +1000, Donovan Baarda wrote: [...] Does the first pass signature block checksum really only use 2 bytes of the md4sum? That

Re: Problem with checksum failing on large files

2002-10-14 Thread Donovan Baarda
On Mon, Oct 14, 2002 at 04:50:27PM -0700, jw schultz wrote: On Tue, Oct 15, 2002 at 02:25:00AM +1000, Donovan Baarda wrote: On Mon, Oct 14, 2002 at 06:22:36AM -0700, jw schultz wrote: On Mon, Oct 14, 2002 at 10:45:44PM +1000, Donovan Baarda wrote: [...] Does the first pass signature

Re: Problem with checksum failing on large files

2002-10-13 Thread Greg Burley
Hi all, This is an interesting problem ;-) I think I understand Craig'a theory but what I don't understand is why the second time rsync is applied to Terry's large file that the transfer is successful? Aren't the two blocks that are actually different that matched by chance going to match every

Re: Problem with checksum failing on large files

2002-10-12 Thread Craig Barratt
terry I'm having a problem with large files being rsync'd twice terry because of the checksum failing. terry Is there a different checksum mechanism used on the second terry pass (e.g., different length)? If so, perhaps there is an terry issue with large files for what is used by default for

Re: Problem with checksum failing on large files

2002-10-12 Thread Derek Simkowiak
My theory is that this is expected behavior given the check sum size. Craig, Excellent analysis! Assuming your hypothesis is correct, I like the adaptive checksum idea. But how much extra processor overhead is there with a larger checksum bit size? Is it worth the extra

Re: Problem with checksum failing on large files

2002-10-12 Thread Donovan Baarda
On Sat, Oct 12, 2002 at 11:13:50AM -0700, Derek Simkowiak wrote: My theory is that this is expected behavior given the check sum size. Craig, Excellent analysis! I was a bit concerned about his maths at first, but I did it myself from scratch using a different aproach and got the

Re: Problem with checksum failing on large files

2002-10-12 Thread Donovan Baarda
On Sat, Oct 12, 2002 at 07:29:36PM -0700, jw schultz wrote: On Sat, Oct 12, 2002 at 11:13:50AM -0700, Derek Simkowiak wrote: My theory is that this is expected behavior given the check sum size. Craig, Excellent analysis! Assuming your hypothesis is correct, I like the

Problem with checksum failing on large files

2002-10-11 Thread Terry Reed
I'm having a problem with large files being rsync'd twice because of the checksum failing. The rsync appears to complete on the first pass, but then is done a second time (with second try successful). When some debug code was added to receiver.c, I saw that the checksum for the remote file the

Re: Problem with checksum failing on large files

2002-10-11 Thread Derek Simkowiak
I'm having a problem with large files being rsync'd twice because of the checksum failing. I think this was reported recently. Please try using the -c option (always checksum) and see if the makes the problem go away. This is a high priority bug for me (although I

RE: Problem with checksum failing on large files

2002-10-11 Thread Terry Reed
-Original Message- From: Derek Simkowiak [mailto:dereks;itsite.com] Sent: Friday, October 11, 2002 1:51 PM To: Terry Reed Cc: '[EMAIL PROTECTED]' Subject: Re: Problem with checksum failing on large files I'm having a problem with large files being rsync'd twice because

Re: Problem with checksum failing on large files

2002-10-11 Thread Derek Simkowiak
same version of rsync at both ends would avoid the problem. His original post showed that this happens acrossed various operating systems with matching versions (2.5.2 and 2.5.5 matched up, and then 2.5.2 and 2.5.5 mismatched, over SunOS and AIX). Could you try to reproduce this

Re: Problem with checksum failing on large files

2002-10-11 Thread Donovan Baarda
On Fri, Oct 11, 2002 at 03:26:45PM -0700, Terry Reed wrote: -Original Message- From: Derek Simkowiak [mailto:dereks;itsite.com] Sent: Friday, October 11, 2002 1:51 PM To: Terry Reed Cc: '[EMAIL PROTECTED]' Subject: Re: Problem with checksum failing on large files