Re: [BackupPC-users] md4 doesn't match

Craig Barratt Sun, 09 Sep 2007 06:49:49 -0700

Lee writes:

> I have been getting this error lately: Exchange-State.bkf: md4 doesn't
> match: will retry in phase 1; file removed


If the file changes during the backup it will be re-attempted.
If it fails a second time it is removed.  From your later email
it appears to succeed on the second attempt, so the message is
benign.

Older versions of rsync used block checksums that were too short during
phase 1, creating cases where this error occurred even when the file
didn't change.  I believe the change happened in protocol version 27
in rsync 2.6.0.  Ah, I see from one of your later emails that you are
using protocol version 26.  This explains why your phase 1 transfer
sometimes fails.  I'm attaching an interesting analysis from 2002 why
this is so - this triggered the fix in rsync protocol version 27 to
make the first phase checksum length adaptive based on file size.

You should upgrade your client rsync to something less than 5 years old.

> When this error happens I am unable to use the web interface to click
> on my backup number.  The interface just hangs, on any of the backups
> where I had no Xferr errors the interface works fine.  What to do to
> resolve this?

I can't explain that.  Are there any errors in apache's log file?
Are you using mod_perl?

Look in the directory of backups for that host (eg: /data/BackupPC/pc/HOST).
The backups are in the numbered subdirectories.  Is there anything
unusual about the contents of the directory that fails?

Craig

---------- Forwarded message ----------
To:   Donovan Baarda,
      Derek Simkowiak
      Terry Reed
From: Craig Barratt
Cc:   [EMAIL PROTECTED] <[EMAIL PROTECTED]>
Date: Mon, 14 Oct 2002 00:36:36 -0700
Subj: Re: Problem with checksum failing on large files 

craig> My theory is that this is expected behavior given the check sum size.

derek>      Craig,
derek>  Excellent analysis!

donovan> I was a bit concerned about his maths at first, but I did
donovan> it myself from scratch using a different aproach and got
donovan> the same figures...

Ok, so the chance that two (different) blocks have the same first-pass
48 bit checksum is small, but significant (at least 6% for a 4GB file
with 700 bytes blocks).  This probably isn't enough to explain Terry's
problem.

But it just occurred to me that checksum collisions is only part of
the story.  Things are really a lot worse.

Let's assume that the block checksums are unique (so we don't get
tripped up from this first problem).  Let's also assume that the old
file is completely different to the new one, ie: no blocks at any
offset really match.  So rsync will compare the checksum at every
byte offset in the file looking for any match.  If there are nBlocks
blocks, each check has an nBlocks / 2^48 chance of a false match.

Since this test is repeated at every byte offset, the probability
that the file has no false matches is:

    p = (1 - nBlocks / (2^48)) ^ fSize

where fSize is the size of the file (more precisely the exponent should
be (fSize - 700)).

Now for some numbers for 700 byte blocks:

  - 100MB file (104857600 bytes).  nBlocks = 149797.  p = 0.945.

  - 500MB file (524288000 bytes).  nBlocks = 748983.  p = 0.248.

  - 1000MB file (1048576000 bytes).  nBlocks = 1497966.  p = 0.003.

So, on average, if you have a random "new" 1GB file and a random "old"
1GB file, and you rsync then, the 1st phase will fail 99.7% of the
time.

Someone could test this theory: generate two random 500MB files and
rsync them.  Try it a few times.  I claim that on average the first
pass will fail around 75% of the time.

Things get a lot better when the files are very similar.  For each
block that matches, rsync skips the whole block (eg, 700 bytes)
before it starts looking for matching checksums.  So for a file
that is identical it only does nBlock checks, not fSize checks
(700 times fewer).

I recall from Terry's output that the number of bytes transferred after
the two attempts was roughly the same as the file size, so about half
the file is different.  In this case, about fSize/2 lookups will be
done

    p = (1 - nBlocks / (2^48)) ^ (fSize/2)

which is about 0.06 (ie: a 94% chance the 1st pass fails).

For a 1GB byte file with 4096 byte blocks and about half the file
changed, the probability of the first pass working is about 62%,
which is still not great.  So just doing a single test with a
4096 block size might not confirm or contradict my hypothesis.
The probability does go up to about 97% with a 64K block size.

If my new hypothesis is correct we definitely need to increase the size
of the first-pass checksum for files bigger than maybe 50MB.

Craig

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
BackupPC-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/

Re: [BackupPC-users] md4 doesn't match

Reply via email to