File Corruption thoughts.

Donald Sharp Wed, 03 May 2000 19:17:42 -0700
How are people detecting file corruptions within cvs?  
We recently upgraded from using cvs1.8.1 and discovered that
cvs1.10 is much more stringent about file corruption than
1.8.1 was.  ie. cvs commands would fail with 1.10 that
worked with 1.8.1( for instance cvs 1.8.1 is content to add
new tags to a file if it is corrupted. 1.10 will not in 
some situations ).  Upon seeing this I recently wrote
a little perl script to cycle through our repository
and try to determine if a file was corrupted.  There was
two goals to this.  a)  Find corrupted files proactively by
running the script once a month or so.  This would allow
us to easily retrieve from backups.  b)  Identify the
corrupted files in the repository and attempt to fix them.

What the script does is this:

        for each file in the repository
        {
                do a cvs log and grab all the revision numbers
                if( cvs log failed )
                {
                        the file is corrupted note the fact and 
                        continue.
                }

                for each intersting revision
                {
                        see if cvs can successfully recreate the file
                        if( cannot recreate )
                        {
                                the file is corrupted note the fact and continue
                        }
                }
        }

How I determine Interesting Revisions:

        A revision is interesting if it is:

        - The lowest revision number on the mainline
      ( cvs stores differences in descending order on mainline )
        - The highest revision number on a branch
          ( cvs store differences in ascending order on branches )
        - If a branch is pulled from a already existing
          'interesting revision', that 'interesting revision'
      shouldn't be checked as well.
      ( I am assuming that the chain of differences are correct )

Is this a more apropriate way to go about doing this?   Are there
better ways to do this?  I am asking this because one of our 
repositories has 270+k revisions( of which I winnowed it down
to a little over 100+k revisions that where interesting ), and 
this repository isn't our biggest repository by any stretch of
the imagination.  This script took a little over 3 days to
run on the repository, and frankly that's just a little too long.
Any help thoughts would be appreciated.

Now having said that what is the best strategy for fixing a file
by hand, as that I don't *know* when the file got corrupted,
and some of these files go back 6-7 years...

Thanks!

donald
donald
File Corruption thoughts.

Reply via email to