On Tue, Jul 31, 2012 at 11:26 AM, Philip Martin
<philip.mar...@wandisco.com>wrote:

> Stefan Fuhrmann <stefan.fuhrm...@wandisco.com> writes:
>
> > On Mon, Jul 30, 2012 at 12:40 PM, Philip Martin
> > <philip.mar...@wandisco.com>wrote:
> >
> >> When the commit process finds a representation in the rep-cache the only
> >> sanity check that happens is that the revision must be less than or
> >> equal to HEAD.  We don't check that the offset is valid:
> >>
> >>  echo foo > foo
> >>  svnadmin create repo
> >>  svn import -mm foo file://`pwd`/repo/A/f
> >>  sqlite3 repo/db/rep-cache.db "update rep_cache set offset = 4"
> >>  svn import -mm foo file://`pwd`/repo/A/g
> >>
> >> or that the checksum at that offset matches:
> >>
> >>  echo foo > foo
> >>  echo bar > bar
> >>  svnadmin create repo
> >>  svn import -mm foo file://`pwd`/repo/A/f
> >>  sqlite3 repo/db/rep-cache.db "update rep_cache set
> >> hash='e242ed3bffccdf271b7fbaf34ed72d089537b42f'"
> >>  svn import -mm bar file://`pwd`/repo/A/g
> >>
> >> In both cases corruption in the rep-cache leads to corruption in the
> >> revision files but that corruption is not detected by commit process
> >> even though subsequent checkouts will fail.
> >>
> >
> > Has that kind of corruption been observed in the wild?
>
> I'm not aware of any reports.  I did start looking at this because of a
> repository with an incorrect offset referring to an older revision file
> but that now appears to be a different problem.
>
> >> Should we do more sanity checking?  We are using rep-cache to discard
> >> data supplied by the client on the basis that it is already present in
> >> the repository.  Should we check that the offset really is a
> representation
> >> with the expected checksum?
> >>
> >
> > The full verification would look like this:
> > * recursively enumerate all noderevs in the rep's revision
> > * check that at least one uses the rep
> > * read the rep and verify the checksum
> >
> > This seems quite costly to do during commit - in particular during
> > imports and similar mass commit operations.
>
> On commit we would only want to verify that a revision/offset obtained
> from the cache really was a representation.


That is hard to truly verify. But from a practical POV,
you are right - we can check whether (offset, length)
points to something that

* fully lies within the respective file
* starts with "PLAIN\n", "DELTA\n" or "DELTA $someValidRepReference"
* ends with "ENDREP\n"

Please do *not* check that offsets lie within a specific
range but only within the respective file / pack file.
I'm working on an experimental tool that will reorder
the elements within pack files and there will be no
upper limit for representation offsets of a given revision
other than the pack file length itself.


> I suppose we might want to
> construct the full-text corresponding to the representation to verify
> the checksum.


Yes. Worst thing that might happen is a crash because
we didn't check some buffer size. combined_window_cache
will reduce the I/O overhead for frequently checked reps
such as file properties.


> We do not need to verify the whole revision or the whole
> rep-cache.
>

I only thought that it is hard to verify that a given offset
range actually *is* a representation. To verify that, there
must be some noderev pointing to it. Thus, we had to
enumerate all noderevs until we found a suitable one.

-- Stefan^2.

-- 
*Join us this October for Subversion Live
2012<http://www.wandisco.com/svn-live-2012>– 2 full days of training,
networking, live demos and more! 25%
off before Aug. 10th with discount code “earlybird.”

*Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download

Reply via email to